95 | | ## How to continue or restart a simulation ## |
96 | | 1. If you want to continue an existing or finished simulation, change the simulation end date in the `config.card` file. Do not change the simulation start date. |
| 95 | |
| 96 | ## run.card at the end of a simulation ## |
| 97 | At the end of your simulation, the !PeriodState parameter of the ''run.card'' files indicates if the simulation has been '''completed''' or was aborted due to a '''Fatal''' error. |
| 98 | [[BR]]This files contains the following sections : |
| 99 | * Configuration : allows you to find out how many integration steps were simulated and what would be the next integration step if the experiment would be continued. |
| 100 | {{{ |
| 101 | [Configuration] |
| 102 | #lastPREFIX |
| 103 | OldPrefix= # ---> Prefix of the last created files during the simulation = JobName + date of the last period. Used for the Restart |
| 104 | #Warning : OldPrefix not used anymore from libIGCM_v2.5. |
| 105 | #Compute date of loop |
| 106 | PeriodDateBegin= # --->start date of the next period to be simulated |
| 107 | PeriodDateEnd= # ---> end date of the next period to be simulated |
| 108 | CumulPeriod= # ---> number of already simulated periods |
| 109 | # State of Job "Start", "Running", "OnQueue", "Completed" |
| 110 | PeriodState="Completed" |
| 111 | |
| 112 | SubmitPath= # ---> Submission directory |
| 113 | }}} |
| 114 | * !PostProcessing : returns information about the post processing status |
| 115 | {{{ |
| 116 | [PostProcessing] |
| 117 | TimeSeriesRunning=n # ---> indicates if the timeSeries are running |
| 118 | TimeSeriesCompleted=20091231 # ---> indicates the date of the last TimeSerie produced by the post processing |
| 119 | }}} |
| 120 | * Log : returns technical (run-time) information such as the size of your executable and the execution time of each integration step. |
| 121 | {{{ |
| 122 | [Log] |
| 123 | # Executables Size |
| 124 | LastExeSize=() |
| 125 | |
| 126 | #--------------------------------- |
| 127 | # CumulPeriod | PeriodDateBegin | PeriodDateEnd | RunDateBegin | RunDateEnd | RealCpuTime | UserCpuTime | SysCpuTime | ExeDate |
| 128 | # 1 | 20000101 | 20000131 | 2013-02-15T16:14:15 | 2013-02-15T16:27:34 | 798.33000 | 0.37000 | 3.05000 | ATM_Feb_15_16:13-OCE_Feb_15_15:56-CPL_Feb_15_15:43 |
| 129 | # 2 | 20000201 | 20000228 | 2013-02-15T16:27:46 | 2013-02-15T16:39:44 | 718.16000 | 0.36000 | 3.39000 | ATM_Feb_15_16:13-OCE_Feb_15_15:56-CPL_Feb_15_15:43 |
| 130 | }}} |
| 131 | If the [#run.cardattheendofasimulation run.card] file indicates a problem at the end of the simulation, you can check your Script_Output file for more details. See [wiki:DocGmonitor more details here]. |
| 132 | |
| 133 | ## Script_Output_JobName ## |
| 134 | A Script_Output_JobName file is created for each job executed. It contains the simulation job output log (list of the executed scripts, management of the I/O scripts). |
| 135 | [[BR]] |
| 136 | This file contains mainly three parts : |
| 137 | * copying and handling of input and parameters files |
| 138 | * running the model |
| 139 | * copying of outputs files and launching of post processing steps (rebuild and pack) |
| 140 | These three parts are defined as below : |
| 141 | {{{ |
| 142 | ####################################### |
| 143 | # ANOTHER GREAT SIMULATION # |
| 144 | ####################################### |
| 145 | |
| 146 | 1st part (copying and handling of the input and parameter files) |
| 147 | |
| 148 | ####################################### |
| 149 | # DIR BEFORE RUN EXECUTION # |
| 150 | ####################################### |
| 151 | |
| 152 | 2nd part (running the model) |
| 153 | |
| 154 | ####################################### |
| 155 | # DIR AFTER RUN EXECUTION # |
| 156 | ####################################### |
| 157 | |
| 158 | 3rd part (copying of outputs files and launching of post processing steps (rebuild and pack)) |
| 159 | |
| 160 | }}} |
| 161 | |
| 162 | ## The output files ## |
| 163 | |
| 164 | The output files are stored on file servers. Their name follows a standardized nomenclature: IGCM_OUT/!TagName/[!SpaceName]/[!ExperimentName]/!JobName/ in different subdirectories for each "Output" and "Analyse" component (e.g. ATM/Output, ATM/Analyse), DEBUG, RESTART, ATLAS and MONITORING. |
| 165 | |
| 166 | Prior to the packs execution, this directory structure is stored |
| 167 | * on the $SCRATCHDIR at TGCC |
| 168 | * on the $WORKDIR at IDRIS |
| 169 | |
| 170 | After the packs execution (see diagram below), this tree is stored |
| 171 | * on the $CCCSTOREDIR and the $CCCWORKDIR at TGCC |
| 172 | * on the Ergon machine at IDRIS |
| 173 | |
| 174 | ### Here is the storage directory structure of the output files produced at TGCC ### |
| 175 | |
| 176 | [[Image(Resultats-TGCC.jpg, 50%)]] |
| 177 | |
| 178 | ### Here is the storage directory structure of the output files produced at IDRIS ### |
| 179 | |
| 180 | [[Image(Resultats-IDRIS.jpg, 50%)]] |
| 181 | |
| 182 | ## Debug/ directory ## |
| 183 | A Debug/ directory is created if the simulation crashed. This directory contains text files from each of the model components to help you finding reasons for the crash. See also [wiki:DocGmonitor#Debug the chapter on monitoring and debugging]. |
| 184 | |
| 185 | ## How to continue or restart a simulation ?## |
| 186 | 1. If you want to continue an existing and finished simulation, change the simulation end date in the `config.card` file. Do not change the simulation start date. |
112 | | |
113 | | ## The output files ## |
114 | | |
115 | | The output files are stored on file servers. Their name follows a standardized nomenclature: IGCM_OUT/!TagName/[!SpaceName]/[!ExperimentName]/!JobName/ in different subdirectories for each "Output" and "Analyse" component (e.g. ATM/Output, ATM/Analyse), DEBUG, RESTART, ATLAS and MONITORING. |
116 | | |
117 | | Prior to the packs execution, this directory structure is stored |
118 | | * on the $SCRATCHDIR at TGCC |
119 | | * on the $WORKDIR at IDRIS |
120 | | |
121 | | After the packs execution (see diagram below), this tree is stored |
122 | | * on the $CCCSTOREDIR and the $CCCWORKDIR at TGCC |
123 | | * on the Ergon machine at IDRIS |
124 | | |
125 | | ### Here is the storage directory structure of the output files produced at TGCC ### |
126 | | |
127 | | [[Image(Resultats-TGCC.jpg, 50%)]] |
128 | | |
129 | | ### Here is the storage directory structure of the output files produced at IDRIS ### |
130 | | |
131 | | [[Image(Resultats-IDRIS.jpg, 50%)]] |
132 | | |
133 | | ## run.card at the end of a simulation ## |
134 | | At the end of your simulation, the !PeriodState parameter of the ''run.card'' files indicates if the simulation has been '''completed''' or was aborted due to a '''Fatal''' error. |
135 | | [[BR]]This files contains the following sections : |
136 | | * Configuration : allows you to find out how many integration steps were simulated and what would be the next integration step if the experiment would be continued. |
137 | | {{{ |
138 | | [Configuration] |
139 | | #last PREFIX |
140 | | OldPrefix= # ---> prefix of the last created files during the simulation = JobName + date of the last period. Used for the Restart |
141 | | #Compute date of loop |
142 | | PeriodDateBegin= # --->start date of the next period to be simulated |
143 | | PeriodDateEnd= # ---> end date of the next period to be simulated |
144 | | CumulPeriod= # ---> number of already simulated periods |
145 | | # State of Job "Start", "Running", "OnQueue", "Completed" |
146 | | PeriodState="Completed" |
147 | | |
148 | | SubmitPath= # ---> Submission directory |
149 | | }}} |
150 | | * !PostProcessing : returns information about the post processing status |
151 | | {{{ |
152 | | [PostProcessing] |
153 | | TimeSeriesRunning=n # ---> indicates if the timeSeries are running |
154 | | TimeSeriesCompleted=20091231 # ---> indicates the date of the last TimeSerie produced by the post processing |
155 | | }}} |
156 | | * Log : returns technical (run-time) information such as the size of your executable and the execution time of each integration step. |
157 | | {{{ |
158 | | [Log] |
159 | | # Executables Size |
160 | | LastExeSize=() |
161 | | |
162 | | #--------------------------------- |
163 | | # CumulPeriod | PeriodDateBegin | PeriodDateEnd | RunDateBegin | RunDateEnd | RealCpuTime | UserCpuTime | SysCpuTime | ExeDate |
164 | | # 1 | 20000101 | 20000131 | 2013-02-15T16:14:15 | 2013-02-15T16:27:34 | 798.33000 | 0.37000 | 3.05000 | ATM_Feb_15_16:13-OCE_Feb_15_15:56-CPL_Feb_15_15:43 |
165 | | # 2 | 20000201 | 20000228 | 2013-02-15T16:27:46 | 2013-02-15T16:39:44 | 718.16000 | 0.36000 | 3.39000 | ATM_Feb_15_16:13-OCE_Feb_15_15:56-CPL_Feb_15_15:43 |
166 | | }}} |
167 | | |
168 | | |
169 | | ## Script_Output_JobName ## |
170 | | A Script_Output_JobName file is created for each job executed. It contains the simulation job output log (list of the executed scripts, management of the I/O scripts). |
171 | | [[BR]] |
172 | | This file contains three parts : |
173 | | * copying the input files |
174 | | * running the model |
175 | | * post processing |
176 | | These three parts are defined as below : |
177 | | {{{ |
178 | | ####################################### |
179 | | # ANOTHER GREAT SIMULATION # |
180 | | ####################################### |
181 | | |
182 | | 1st part (copying the input files) |
183 | | |
184 | | ####################################### |
185 | | # DIR BEFORE RUN EXECUTION # |
186 | | ####################################### |
187 | | |
188 | | 2nd part (running the model) |
189 | | |
190 | | ####################################### |
191 | | # DIR AFTER RUN EXECUTION # |
192 | | ####################################### |
193 | | |
194 | | 3rd part (post processing) |
195 | | |
196 | | }}} |
197 | | If the [#run.cardattheendofasimulation run.card] file indicates a problem at the end of the simulation, you can check your Script_Output file for more details. See [wiki:DocGmonitor more details here]. |
198 | | |
199 | | |
200 | | ## Debug/ directory ## |
201 | | A Debug/ directory is created if the simulation crashed. This directory will contain text files from each of the model components to help you finding reasons for the crash. See also [wiki:DocGmonitor#Debug the chapter on monitoring and debugging]. |