wiki:DevelopmentActivities/ORCHIDEE-ML-Spinup

Version 16 (modified by dgoll, 4 years ago) (diff)

--

Spin up with a Machine Learning approach

What is it about?

Aim: develop a spinup acceleration procedure which is model version independent. The idea is to develop a python tool set which can applied to the ORCHIDEE family of models.

How can I contribute to this effort?

Please contact the D.Goll if you want to join. Some example we would benefit from are:

  • data from conventional spinup simulations
  • expertise how to link it to other tools, like libIGCM, ORCHIDAS etc.
  • expertise how to host/distribute/maintain the software
  • machine learning, python

Task force members

Daniel Goll, Yan Sun, Jinfeng Chang, Yilong Wang, Yuanyuan Huang, Vladislav Bastrikov, Nicolas Viovy Matt McGrath?

Status reports

26/01/2021

  • DONE: Proof of concept for ORCHIDEE-CNP v1.2
  • ONGOING: Finding a common setup for pixel selection applicable to all ORCHIDEE versions
  • ONGOING: Collecting data from other ORCHIDEE versions for testing
  • ONGOING: Translating matlab into python code
  • ONGOING: Cleaning the code
  • ONGOING: Recruiting task force members

16/02/2021

Yan gave a presentation on progress with python coding, results on CNP and trunk, and timeline for next 2 months.

  • Input files: restart + climate forcing (not hist file as might ORCHIDEE might introduce noise)
  • K-means clustering: add plot which shows the total distance vs k to monitor if the chosen number of cluster paranmeter is well chosen (part of the monitoring info for user)
  • Add checks and quality statistics to monitor if each steps performs well & stop the procedure is results fail minimum quality criteria (e.g. stop if machine learning fails to predict training pixels)
  • Externalize all parameters of the routines in one file.

Work distribution:

  • Matt: Provide trunk v4.0 data (EQ files, + results from 200yr after scratch w/o anal spinup)
  • Yilong refines & extend coding of tool 1&2
  • Run tests with the refined tools for other forcings (everyone)
  • Yan will focus next month on PhD defens (20.March)

03/03/2021

  • First version of python tools are available for testing
  • Yilong gave an overview

Next steps:

  • put code and documentation on github (Daniel, Vlad, Yilong)
  • add documentation on how to run the tools; adapt them to other models (Yan,Yilong)

  • all attempt to run the tools with their model data (keep a log on github about what model data used)

information/suggestions on run the tools:

  • user specification files: need more information, e.g. what file name corresponds to Equilibirum information what to info from transient run (Yan)
  • things to improve: figure labelling, user spec file (simplify)
  • try to use qsub to avoid blocking nodes on obelix

16/03/2021

  • github has been setup and some initial test and exchanges were done
  • next: everyone try and test the tool on the two available datasets (CNP, trunk); report bugs, improvmenets, etc on github
  • ongoing: acquire data from other model (versions): CABLE, ORCHIDEE-MICT, ORCHIDEE-<any>
  • next meeting will be scheduled after discussion with Yan after her defence

01/04/2021

  • github code status: YY could run the code, DG did some test modifying some inputs, all detected (minor) problems are listed in issues in github
  • TODO1 (yan): provide information in README how to insert data from other simulation; separate the user specification files into experiment specific (e.g. path to model output, forcing period (for tool 3), etc) and model version specific (e.g. CNP, MICT, Trunk, CABLE, etc).
  • TODO2 (yan): provide a tool 2 output which condense the information from now multiple files into a single file.
  • TOdO3 (yan): work on the manuscript (incl. results from test with other model versions (if feasible from TODO4) and CABLE)
  • TODO4 (YY, DG, all): test the tools 1 and 2 when TODO1 and TODO2 are ready.
  • TODO5 (DG): discuss with project team about the running scripts.
  • TODO6( Yan) : code a evaluation tool (tool 3); check criterias are (1) high priority (total land C stock), (2) medium priority (land C stock on pixel), (3) others / drift over forcing period (i.e. climate loop).

14/04/2021

Progress since last meeting:

  • update of README
  • bug detected for biomass pool
  • evaluation tool for developers

To do

  • update development to github (e.g README)(yan)
  • produce evaluation tool to test if the ML works for training sites (Yan)
  • send the data location for CNP-MIMICS runs and MICT runs (Yan)
  • adapt the tool for MIMICS and MICT (Daniel , + all) to test if tool structure and documentation
  • produce restart files for CABLE (Yan)
  • finalize the paper within 4 weeks
  • next meeting in 3 week due to Yans move

05/05/2021

TODO:

  • revise the varlist.json to be more flexible regarding varying variables/dimensions in the restart files of ORCHIDEE versions (Vlad)
  • MICT restart file: which variables are needed which ae not? What do the dimensions stand for? (Jingfeng)
  • visualization of the quality of the training (Daniel)
  • CNP-MIMICS trainging data (Daniel)

19/05/2021

Progress since last meeting:

  • MICT: deepC_a, deepC_s, deepC_p are state variables. carbon stores the depth integrated SOC information and can be derived from the other three.
  • new json syntax for more flexibility proposed NEXT: Yan, Yilong discuss about feasilibilty to introduce the concept
  • evaluation tool: LOOCV (optional for developers), quick check plots (mandatory for users) NEXT: finalize and upload to github
  • evluation tools: different statistical variables to be tested, tradeoff between user-friendliness and information content