= Spin up with a Machine Learning approach =


== What is it about? ==
Aim: develop a spinup acceleration procedure which is model version independent. The idea is to develop a python tool set which can applied to the ORCHIDEE family of models.

== How can I contribute to this effort? ==
Please contact the D.Goll if you want to join. Some example we would benefit from are:
* data from conventional spinup simulations
* expertise how to link it to other tools, like libIGCM, ORCHIDAS etc.
* expertise how to host/distribute/maintain the software
* machine learning, python


== Task force members ==
Daniel Goll,
Yan Sun,
Jinfeng Chang,
Yilong Wang,
Yuanyuan Huang,
Vladislav Bastrikov,
Nicolas Viovy
Matt McGrath


== Status reports ==
=== 26/01/2021 ===
* DONE: Proof of concept for ORCHIDEE-CNP v1.2
* ONGOING: Finding a common setup for pixel selection applicable to all ORCHIDEE versions
* ONGOING: Collecting data from other ORCHIDEE versions for testing
* ONGOING: Translating matlab into python code
* ONGOING: Cleaning the code
* ONGOING: Recruiting task force members

=== 16/02/2021 ===
Yan gave a presentation on progress with python coding, results on CNP and trunk, and timeline for next 2 months.

* Input files: restart + climate forcing (not hist file as might ORCHIDEE might introduce noise)
* K-means clustering: add plot which shows the total distance vs k to monitor if the chosen number of cluster paranmeter is well chosen (part of the monitoring info for user)
* Add checks and quality statistics to monitor if each steps performs well & stop the procedure is results fail minimum quality criteria (e.g. stop if machine learning fails to predict training pixels)
* Externalize all parameters of the routines in one file.

Work distribution:
* Matt: Provide trunk v4.0 data (EQ files, + results from 200yr after scratch w/o anal spinup)
* Yilong refines & extend coding of tool 1&2 	
* Run tests with the refined tools for other forcings (everyone)
* Yan will focus next month on PhD defens (20.March)

=== 03/03/2021 ===
* First version of python tools are available for testing
* Yilong gave an overview

Next steps:

* put code and documentation on github (Daniel, Vlad, Yilong)
* add documentation on how to run the tools; adapt them to other models (Yan,Yilong)
 
* all attempt to run the tools with their model data (keep a log on github about what model data used)

information/suggestions on run the tools:

* user specification files: need more information, e.g. what file name corresponds to Equilibirum information what to info from transient run (Yan)
* things to improve: figure labelling, user spec file (simplify)
* try to use qsub to avoid blocking nodes on obelix



=== 16/03/2021 ===
* github has been setup and some initial test and exchanges were done
* next: everyone try and test the tool on the two available datasets (CNP, trunk); report bugs, improvmenets, etc on github
* ongoing: acquire data from other model (versions): CABLE, ORCHIDEE-MICT, ORCHIDEE-<any>

* next meeting will be scheduled after discussion with Yan after her defence


=== 01/04/2021 ===

* github code status: YY could run the code, DG did some test modifying some inputs, all detected (minor) problems are listed in issues in github
* TODO1 (yan): provide information in README how to insert data from other simulation; separate the user specification files into experiment specific (e.g. path to model output, forcing period (for tool 3), etc) and model version specific (e.g. CNP, MICT, Trunk, CABLE, etc).
* TODO2 (yan): provide a tool 2 output which condense the information from now multiple files into a single file.
* TOdO3 (yan): work on the manuscript (incl. results from test with other model versions (if feasible from TODO4) and CABLE)
* TODO4 (YY, DG, all): test the tools 1 and 2 when TODO1 and TODO2 are ready.
* TODO5 (DG): discuss with project team about the running scripts.
* TODO6( Yan) : code a evaluation tool (tool 3); check criterias are (1) high priority (total land C stock), (2) medium priority (land C stock on pixel), (3) others / drift over forcing period (i.e. climate loop).

=== 14/04/2021 ===
Progress since last meeting:
* update of README 
* bug detected for biomass pool 
* evaluation tool for developers 


To do
* update development to github (e.g  README)(yan)
* produce evaluation tool to test if the ML works for training sites (Yan) 
* send the data location for CNP-MIMICS runs and MICT runs (Yan)
* adapt the tool for MIMICS and MICT (Daniel , + all) to test if tool structure and documentation
* produce restart files for CABLE (Yan)
* finalize the paper within 4 weeks 
* next meeting in 3 week due to Yans move


=== 05/05/2021 ===

TODO:
* revise the varlist.json to be more flexible regarding varying variables/dimensions in the restart files of ORCHIDEE versions (Vlad)
* MICT restart file: which variables are needed which ae not? What do the dimensions stand for? (Jingfeng)
* visualization of the quality of the training (Daniel)
* CNP-MIMICS trainging data (Daniel)