= Spin up with a Machine Learning approach = == What is it about? == Aim: develop a spinup acceleration procedure which is model version independent. The idea is to develop a python tool set which can applied to the ORCHIDEE family of models. == How can I contribute to this effort? == Please contact the D.Goll if you want to join. Some example we would benefit from are: * data from conventional spinup simulations * expertise how to link it to other tools, like libIGCM, ORCHIDAS etc. * expertise how to host/distribute/maintain the software * machine learning, python == Task force members == Daniel Goll, Yan Sun, Jinfeng Chang, Yilong Wang, Yuanyuan Huang, Vladislav Bastrikov, Nicolas Viovy Matt McGrath == Status reports == === 26/01/2021 === * DONE: Proof of concept for ORCHIDEE-CNP v1.2 * ONGOING: Finding a common setup for pixel selection applicable to all ORCHIDEE versions * ONGOING: Collecting data from other ORCHIDEE versions for testing * ONGOING: Translating matlab into python code * ONGOING: Cleaning the code * ONGOING: Recruiting task force members === 16/02/2021 === Yan gave a presentation on progress with python coding, results on CNP and trunk, and timeline for next 2 months. * Input files: restart + climate forcing (not hist file as might ORCHIDEE might introduce noise) * K-means clustering: add plot which shows the total distance vs k to monitor if the chosen number of cluster paranmeter is well chosen (part of the monitoring info for user) * Add checks and quality statistics to monitor if each steps performs well & stop the procedure is results fail minimum quality criteria (e.g. stop if machine learning fails to predict training pixels) * Externalize all parameters of the routines in one file. Work distribution: * Matt: Provide trunk v4.0 data (EQ files, + results from 200yr after scratch w/o anal spinup) * Yilong refines & extend coding of tool 1&2 * Run tests with the refined tools for other forcings (everyone) * Yan will focus next month on PhD defens (20.March) === 03/03/2021 === * First version of python tools are available for testing * Yilong gave an overview Next steps: * put code and documentation on github (Daniel, Vlad, Yilong) * add documentation on how to run the tools; adapt them to other models (Yan,Yilong) * all attempt to run the tools with their model data (keep a log on github about what model data used) information/suggestions on run the tools: * user specification files: need more information, e.g. what file name corresponds to Equilibirum information what to info from transient run (Yan) * things to improve: figure labelling, user spec file (simplify) * try to use qsub to avoid blocking nodes on obelix === 16/03/2021 === * github has been setup and some initial test and exchanges were done * next: everyone try and test the tool on the two available datasets (CNP, trunk); report bugs, improvmenets, etc on github * ongoing: acquire data from other model (versions): CABLE, ORCHIDEE-MICT, ORCHIDEE- * next meeting will be scheduled after discussion with Yan after her defence === 01/04/2021 === * github code status: YY could run the code, DG did some test modifying some inputs, all detected (minor) problems are listed in issues in github * TODO1 (yan): provide information in README how to insert data from other simulation; separate the user specification files into experiment specific (e.g. path to model output, forcing period (for tool 3), etc) and model version specific (e.g. CNP, MICT, Trunk, CABLE, etc). * TODO2 (yan): provide a tool 2 output which condense the information from now multiple files into a single file. * TOdO3 (yan): work on the manuscript (incl. results from test with other model versions (if feasible from TODO4) and CABLE) * TODO4 (YY, DG, all): test the tools 1 and 2 when TODO1 and TODO2 are ready. * TODO5 (DG): discuss with project team about the running scripts. * TODO6( Yan) : code a evaluation tool (tool 3); check criterias are (1) high priority (total land C stock), (2) medium priority (land C stock on pixel), (3) others / drift over forcing period (i.e. climate loop). === 14/04/2021 === Progress since last meeting: * update of README * bug detected for biomass pool * evaluation tool for developers To do * update development to github (e.g README)(yan) * produce evaluation tool to test if the ML works for training sites (Yan) * send the data location for CNP-MIMICS runs and MICT runs (Yan) * adapt the tool for MIMICS and MICT (Daniel , + all) to test if tool structure and documentation * produce restart files for CABLE (Yan) * finalize the paper within 4 weeks * next meeting in 3 week due to Yans move === 05/05/2021 === TODO: * revise the varlist.json to be more flexible regarding varying variables/dimensions in the restart files of ORCHIDEE versions (Vlad) * MICT restart file: which variables are needed which ae not? What do the dimensions stand for? (Jingfeng) * visualization of the quality of the training (Daniel) * CNP-MIMICS trainging data (Daniel)