Version 16 (modified by dgoll, 4 years ago) (diff) |
---|
Spin up with a Machine Learning approach
What is it about?
Aim: develop a spinup acceleration procedure which is model version independent. The idea is to develop a python tool set which can applied to the ORCHIDEE family of models.
How can I contribute to this effort?
Please contact the D.Goll if you want to join. Some example we would benefit from are:
- data from conventional spinup simulations
- expertise how to link it to other tools, like libIGCM, ORCHIDAS etc.
- expertise how to host/distribute/maintain the software
- machine learning, python
Task force members
Daniel Goll, Yan Sun, Jinfeng Chang, Yilong Wang, Yuanyuan Huang, Vladislav Bastrikov, Nicolas Viovy Matt McGrath?
Status reports
26/01/2021
- DONE: Proof of concept for ORCHIDEE-CNP v1.2
- ONGOING: Finding a common setup for pixel selection applicable to all ORCHIDEE versions
- ONGOING: Collecting data from other ORCHIDEE versions for testing
- ONGOING: Translating matlab into python code
- ONGOING: Cleaning the code
- ONGOING: Recruiting task force members
16/02/2021
Yan gave a presentation on progress with python coding, results on CNP and trunk, and timeline for next 2 months.
- Input files: restart + climate forcing (not hist file as might ORCHIDEE might introduce noise)
- K-means clustering: add plot which shows the total distance vs k to monitor if the chosen number of cluster paranmeter is well chosen (part of the monitoring info for user)
- Add checks and quality statistics to monitor if each steps performs well & stop the procedure is results fail minimum quality criteria (e.g. stop if machine learning fails to predict training pixels)
- Externalize all parameters of the routines in one file.
Work distribution:
- Matt: Provide trunk v4.0 data (EQ files, + results from 200yr after scratch w/o anal spinup)
- Yilong refines & extend coding of tool 1&2
- Run tests with the refined tools for other forcings (everyone)
- Yan will focus next month on PhD defens (20.March)
03/03/2021
- First version of python tools are available for testing
- Yilong gave an overview
Next steps:
- put code and documentation on github (Daniel, Vlad, Yilong)
- add documentation on how to run the tools; adapt them to other models (Yan,Yilong)
- all attempt to run the tools with their model data (keep a log on github about what model data used)
information/suggestions on run the tools:
- user specification files: need more information, e.g. what file name corresponds to Equilibirum information what to info from transient run (Yan)
- things to improve: figure labelling, user spec file (simplify)
- try to use qsub to avoid blocking nodes on obelix
16/03/2021
- github has been setup and some initial test and exchanges were done
- next: everyone try and test the tool on the two available datasets (CNP, trunk); report bugs, improvmenets, etc on github
- ongoing: acquire data from other model (versions): CABLE, ORCHIDEE-MICT, ORCHIDEE-<any>
- next meeting will be scheduled after discussion with Yan after her defence
01/04/2021
- github code status: YY could run the code, DG did some test modifying some inputs, all detected (minor) problems are listed in issues in github
- TODO1 (yan): provide information in README how to insert data from other simulation; separate the user specification files into experiment specific (e.g. path to model output, forcing period (for tool 3), etc) and model version specific (e.g. CNP, MICT, Trunk, CABLE, etc).
- TODO2 (yan): provide a tool 2 output which condense the information from now multiple files into a single file.
- TOdO3 (yan): work on the manuscript (incl. results from test with other model versions (if feasible from TODO4) and CABLE)
- TODO4 (YY, DG, all): test the tools 1 and 2 when TODO1 and TODO2 are ready.
- TODO5 (DG): discuss with project team about the running scripts.
- TODO6( Yan) : code a evaluation tool (tool 3); check criterias are (1) high priority (total land C stock), (2) medium priority (land C stock on pixel), (3) others / drift over forcing period (i.e. climate loop).
14/04/2021
Progress since last meeting:
- update of README
- bug detected for biomass pool
- evaluation tool for developers
To do
- update development to github (e.g README)(yan)
- produce evaluation tool to test if the ML works for training sites (Yan)
- send the data location for CNP-MIMICS runs and MICT runs (Yan)
- adapt the tool for MIMICS and MICT (Daniel , + all) to test if tool structure and documentation
- produce restart files for CABLE (Yan)
- finalize the paper within 4 weeks
- next meeting in 3 week due to Yans move
05/05/2021
TODO:
- revise the varlist.json to be more flexible regarding varying variables/dimensions in the restart files of ORCHIDEE versions (Vlad)
- MICT restart file: which variables are needed which ae not? What do the dimensions stand for? (Jingfeng)
- visualization of the quality of the training (Daniel)
- CNP-MIMICS trainging data (Daniel)
19/05/2021
Progress since last meeting:
- MICT: deepC_a, deepC_s, deepC_p are state variables. carbon stores the depth integrated SOC information and can be derived from the other three.
- new json syntax for more flexibility proposed NEXT: Yan, Yilong discuss about feasilibilty to introduce the concept
- evaluation tool: LOOCV (optional for developers), quick check plots (mandatory for users) NEXT: finalize and upload to github
- evluation tools: different statistical variables to be tested, tradeoff between user-friendliness and information content