Changes between Version 20 and Version 21 of Doc/Running


Ignore:
Timestamp:
09/24/15 16:33:38 (9 years ago)
Author:
aclsce
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • Doc/Running

    v20 v21  
    9393 
    9494}}} 
    95 ## How to continue or restart a simulation ## 
    96  1. If you want to continue an existing or finished simulation, change the simulation end date in the `config.card` file. Do not change the simulation start date. 
     95 
     96## run.card at the end of a simulation ## 
     97At the end of your simulation, the !PeriodState parameter of the ''run.card'' files indicates if the simulation has been '''completed''' or was aborted due to a '''Fatal''' error. 
     98[[BR]]This files contains the following sections : 
     99 * Configuration : allows you to find out how many integration steps were simulated and what would be the next integration step if the experiment would be continued.  
     100{{{ 
     101[Configuration] 
     102#lastPREFIX 
     103OldPrefix=        # ---> Prefix of the last created files during the simulation = JobName + date of the last period. Used for the Restart 
     104#Warning : OldPrefix not used anymore from libIGCM_v2.5. 
     105#Compute date of loop 
     106PeriodDateBegin=   #  --->start date of the next period to be simulated 
     107PeriodDateEnd=     # ---> end date of the next period to be simulated 
     108CumulPeriod=       # ---> number of already simulated periods  
     109# State of Job "Start", "Running", "OnQueue", "Completed" 
     110PeriodState="Completed"    
     111         
     112SubmitPath=   # ---> Submission directory 
     113}}}      
     114 * !PostProcessing :  returns information about the post processing status 
     115{{{ 
     116[PostProcessing] 
     117TimeSeriesRunning=n   # ---> indicates if the timeSeries are running 
     118TimeSeriesCompleted=20091231   # ---> indicates the date of the last TimeSerie produced by the post processing 
     119}}} 
     120 * Log : returns technical (run-time) information such as the size of your executable and the execution time of each integration step. 
     121{{{ 
     122[Log] 
     123# Executables Size 
     124LastExeSize=() 
     125 
     126#--------------------------------- 
     127# CumulPeriod | PeriodDateBegin |   PeriodDateEnd |        RunDateBegin |          RunDateEnd |     RealCpuTime |     UserCpuTime |      SysCpuTime | ExeDate 
     128#           1 |        20000101 |        20000131 | 2013-02-15T16:14:15 | 2013-02-15T16:27:34 |       798.33000 |         0.37000 |         3.05000 | ATM_Feb_15_16:13-OCE_Feb_15_15:56-CPL_Feb_15_15:43 
     129#           2 |        20000201 |        20000228 | 2013-02-15T16:27:46 | 2013-02-15T16:39:44 |       718.16000 |         0.36000 |         3.39000 | ATM_Feb_15_16:13-OCE_Feb_15_15:56-CPL_Feb_15_15:43 
     130}}} 
     131If the [#run.cardattheendofasimulation run.card] file indicates a problem at the end of the simulation, you can check your Script_Output file for more details. See [wiki:DocGmonitor more details here]. 
     132 
     133## Script_Output_JobName ## 
     134A Script_Output_JobName file is created for each job executed. It contains the simulation job output log (list of the executed scripts, management of the I/O scripts). 
     135[[BR]] 
     136This file contains mainly three parts :  
     137 * copying and handling of input and parameters files 
     138 * running the model  
     139 * copying of outputs files and launching of post processing steps (rebuild and pack) 
     140These three parts are defined as below :  
     141{{{ 
     142####################################### 
     143#       ANOTHER GREAT SIMULATION      # 
     144####################################### 
     145 
     146 1st part (copying and handling of the input and parameter files) 
     147 
     148####################################### 
     149#      DIR BEFORE RUN EXECUTION       # 
     150####################################### 
     151 
     152 2nd part (running the model) 
     153 
     154####################################### 
     155#       DIR AFTER RUN EXECUTION       # 
     156####################################### 
     157 
     158 3rd part (copying of outputs files and launching of post processing steps (rebuild and pack)) 
     159 
     160}}} 
     161 
     162## The output files ## 
     163 
     164The output files are stored on file servers. Their name follows a standardized nomenclature: IGCM_OUT/!TagName/[!SpaceName]/[!ExperimentName]/!JobName/ in different subdirectories for each "Output" and "Analyse" component (e.g. ATM/Output,  ATM/Analyse), DEBUG, RESTART, ATLAS and MONITORING. 
     165 
     166Prior to the packs execution, this directory structure is stored  
     167 * on the $SCRATCHDIR at TGCC 
     168 * on the $WORKDIR at IDRIS 
     169 
     170After the packs execution (see diagram below), this tree is stored 
     171 * on the $CCCSTOREDIR and the $CCCWORKDIR at TGCC  
     172 * on the Ergon machine at IDRIS  
     173 
     174### Here is the storage directory structure of the output files produced at TGCC ### 
     175 
     176[[Image(Resultats-TGCC.jpg, 50%)]] 
     177 
     178### Here is the storage directory structure of the output files produced at IDRIS ### 
     179 
     180[[Image(Resultats-IDRIS.jpg, 50%)]] 
     181 
     182## Debug/ directory ##  
     183A Debug/ directory is created if the simulation crashed. This directory contains text files from each of the model components to help you finding reasons for the crash. See also [wiki:DocGmonitor#Debug the chapter on monitoring and debugging]. 
     184 
     185## How to continue or restart a simulation ?## 
     186 1. If you want to continue an existing and finished simulation, change the simulation end date in the `config.card` file. Do not change the simulation start date. 
    97187 1. In the `run.card` file you must:  
    98188  * check that the `PeriodDateBegin` and `PeriodDateEnd` variables match with the next integration step of your simulation (e.g. if you just finished May 2000 and you want to integrate one month, set `PeriodDateBegin= 20000601` and `PeriodDateEnd= 2000630`)  
     
    110200ccc_msub Job_EXP00 or llsubmit Job_EXP00 
    111201}}} 
    112  
    113 ## The output files ## 
    114  
    115 The output files are stored on file servers. Their name follows a standardized nomenclature: IGCM_OUT/!TagName/[!SpaceName]/[!ExperimentName]/!JobName/ in different subdirectories for each "Output" and "Analyse" component (e.g. ATM/Output,  ATM/Analyse), DEBUG, RESTART, ATLAS and MONITORING. 
    116  
    117 Prior to the packs execution, this directory structure is stored  
    118  * on the $SCRATCHDIR at TGCC 
    119  * on the $WORKDIR at IDRIS 
    120  
    121 After the packs execution (see diagram below), this tree is stored 
    122  * on the $CCCSTOREDIR and the $CCCWORKDIR at TGCC  
    123  * on the Ergon machine at IDRIS  
    124  
    125 ### Here is the storage directory structure of the output files produced at TGCC ### 
    126  
    127 [[Image(Resultats-TGCC.jpg, 50%)]] 
    128  
    129 ### Here is the storage directory structure of the output files produced at IDRIS ### 
    130  
    131 [[Image(Resultats-IDRIS.jpg, 50%)]] 
    132  
    133 ## run.card at the end of a simulation ## 
    134 At the end of your simulation, the !PeriodState parameter of the ''run.card'' files indicates if the simulation has been '''completed''' or was aborted due to a '''Fatal''' error. 
    135 [[BR]]This files contains the following sections : 
    136  * Configuration : allows you to find out how many integration steps were simulated and what would be the next integration step if the experiment would be continued.  
    137 {{{ 
    138 [Configuration] 
    139 #last PREFIX 
    140 OldPrefix=        # ---> prefix of the last created files during the simulation = JobName + date of the last period. Used for the Restart 
    141 #Compute date of loop 
    142 PeriodDateBegin=   #  --->start date of the next period to be simulated 
    143 PeriodDateEnd=     # ---> end date of the next period to be simulated 
    144 CumulPeriod=       # ---> number of already simulated periods  
    145 # State of Job "Start", "Running", "OnQueue", "Completed" 
    146 PeriodState="Completed"    
    147          
    148 SubmitPath=   # ---> Submission directory 
    149 }}}      
    150  * !PostProcessing :  returns information about the post processing status 
    151 {{{ 
    152 [PostProcessing] 
    153 TimeSeriesRunning=n   # ---> indicates if the timeSeries are running 
    154 TimeSeriesCompleted=20091231   # ---> indicates the date of the last TimeSerie produced by the post processing 
    155 }}} 
    156  * Log : returns technical (run-time) information such as the size of your executable and the execution time of each integration step. 
    157 {{{ 
    158 [Log] 
    159 # Executables Size 
    160 LastExeSize=() 
    161  
    162 #--------------------------------- 
    163 # CumulPeriod | PeriodDateBegin |   PeriodDateEnd |        RunDateBegin |          RunDateEnd |     RealCpuTime |     UserCpuTime |      SysCpuTime | ExeDate 
    164 #           1 |        20000101 |        20000131 | 2013-02-15T16:14:15 | 2013-02-15T16:27:34 |       798.33000 |         0.37000 |         3.05000 | ATM_Feb_15_16:13-OCE_Feb_15_15:56-CPL_Feb_15_15:43 
    165 #           2 |        20000201 |        20000228 | 2013-02-15T16:27:46 | 2013-02-15T16:39:44 |       718.16000 |         0.36000 |         3.39000 | ATM_Feb_15_16:13-OCE_Feb_15_15:56-CPL_Feb_15_15:43 
    166 }}} 
    167  
    168  
    169 ## Script_Output_JobName ## 
    170 A Script_Output_JobName file is created for each job executed. It contains the simulation job output log (list of the executed scripts, management of the I/O scripts). 
    171 [[BR]] 
    172 This file contains three parts :  
    173  * copying the input files 
    174  * running the model  
    175  * post processing 
    176 These three parts are defined as below :  
    177 {{{ 
    178 ####################################### 
    179 #       ANOTHER GREAT SIMULATION      # 
    180 ####################################### 
    181  
    182  1st part (copying the input files) 
    183  
    184 ####################################### 
    185 #      DIR BEFORE RUN EXECUTION       # 
    186 ####################################### 
    187  
    188  2nd part (running the model) 
    189  
    190 ####################################### 
    191 #       DIR AFTER RUN EXECUTION       # 
    192 ####################################### 
    193  
    194  3rd part (post processing) 
    195  
    196 }}} 
    197 If the [#run.cardattheendofasimulation run.card] file indicates a problem at the end of the simulation, you can check your Script_Output file for more details. See [wiki:DocGmonitor more details here]. 
    198  
    199  
    200 ## Debug/ directory ##  
    201 A Debug/ directory is created if the simulation crashed. This directory will contain text files from each of the model components to help you finding reasons for the crash. See also [wiki:DocGmonitor#Debug the chapter on monitoring and debugging]. 
    202202 
    203203----