Changes between Version 17 and Version 18 of ParallelismPerformances
- Timestamp:
- 2012-12-06T11:06:22+01:00 (12 years ago)
Legend:
- Unmodified
- Added
- Removed
- Modified
-
ParallelismPerformances
v17 v18 166 166 === Patch Evaluation === 167 167 168 ==== Test : NCC forcing files (1°) ==== 169 168 170 In order to study both the influence of the IO patches and the Load balance file, I make a survey using the following setup : 169 171 * NCC forcing file : 360*180, 15238 land points 170 * Loop 10 times over the same yearto study the influence of the Load balance file.172 * 10 years starting from scratch to study the influence of the Load balance file. 171 173 * sechiba_hist_level = 4 172 174 * stomate_hist_level = 5 175 * 125 variables written in the output ! 173 176 * Monthly outputs 174 177 * Tests done on Curie … … 183 186 Notice that the patch is significant for a high number of processors (>16). For 32 and 48 processors, the gain is about 60% (it seems that it is the optimal for NCC). 184 187 After 48, the gain diminished. [[BR]] 185 ''' Recommendations : '''188 ''' Recommendations for NCC forcing : ''' 186 189 * NCC forcing files : 32 processors 187 * CRUNCEP : ~128 processors (evaluation as CRUNCEP has 4 times more points than NCC)188 * Other forcing files : use a linear relationship to evaluate your number of processors. If you have 45000 points you could use 32*3 = 96 processors.189 190 __Summary__ : ORCHIDEE should be used on more than 128 processors for the moment. I you increase the number of processors, you could lose time because of MPI communications and the multiple191 writring of output files.[[BR]]192 190 To know how the parallelization has been improved, you could read the following report [wiki:ParallelVersion here] (in french sorry!). In this report, the optimal number of 193 191 processors was evaluated to 6 for NCC forcing files! 192 193 194 195 ==== Test : CRU-NCEP (0.5°) ==== 196 197 With Nicolas Viovy, we agree on a common protocol to compare his version and the standard one : 198 * CRU-NCEP forcing file : 0.5°, ~60000 land points 199 * 3 years stating from scratch to study the influence of the Load balance file. 200 * sechiba_hist_level = 1 201 * stomate_hist_level = 1 202 * ~20 variables written in the output ! 203 * Monthly outputs 204 * Tests done on Curie 205 * REBUILD is done after the run 206 SECHIBA_hist_level and STOMATE_hist_level are voluntary low, because Nicolas has about 20 variables in his output files.[[BR]] 207 208 '''Results :''' 209 * Nicolas version : 210 211 || Number processors || Time per processor || 212 || 32 || ~20 min (evaluation)|| 213 || 64 || 10 min || 214 || 128 || ~5 min || 215 216 * Standard version (trunk, revision 1076) : 217 || Number processors || Time per processor || 218 || 16 || 24 min || 219 || 32 || 16 min || 220 || 48 || 13 min || 221 || 64 || 11 min || 222 || 128 || 9 min 30 || 223 224 ''' Conclusion :''' 225 * The performance between Nicolas and standard version are similar until 64 processors. After, there are no more improvements in the standard version. 226 * For CRU-NCEP forcing files, the optimal number of processors is 64. Don't use more : you will use too much time computing. 227 * With the standard version, you can use the routing. It is not really possible with Nicolas version. 228 * There are still two problems to solve : 229 - Change level output for some variables : there are too many variables written by ORCHIDEE. We could set to level 1 all the essential variables necessary to performed a spin-up. 230 - Why we lose scalability when we use more than 64 processors ? 231 232 ''' ACTIONS : ''' 233 * Redefined output level for ORCHIDEE variables 234 * Use Vampir to understand the behaviour of ORCHIDEE on a high number of processors.