Context Navigation

← Previous Change
Wiki History
Next Change →

Changes between Version 17 and Version 18 of ParallelismPerformances

Timestamp:: 2012-12-06T11:06:22+01:00 (12 years ago)
Author:: dsolyga
Comment:: --

Legend:

: Unmodified
: Added
: Removed
: Modified

ParallelismPerformances

-                      v17
+                      v18
 === Patch Evaluation ===
+==== Test : NCC forcing files (1°) ====
 In order to study both the influence of the IO patches and the Load balance file, I make a survey using the following setup :
  * NCC forcing file : 360*180, 15238 land points
  * Loop 10 times over the same year to study the influence of the Load balance file.
+ * 10 years starting from scratch to study the influence of the Load balance file.
  * sechiba_hist_level = 4
  * stomate_hist_level = 5
+ * 125 variables written in the output !
  * Monthly outputs
  * Tests done on Curie
 …
 Notice that the patch is significant for a high number of processors (>16). For 32 and 48 processors, the gain is about 60% (it seems that it is the optimal for NCC).
 After 48, the gain diminished. [[BR]]
 ''' Recommendations  : '''
+''' Recommendations for NCC forcing : '''
  * NCC forcing files : 32 processors
- * CRUNCEP : ~128 processors (evaluation as CRUNCEP has 4 times more points than NCC)
- * Other forcing files : use a linear relationship to evaluate your number of processors. If you have 45000 points you could use 32*3 = 96 processors.
-__Summary__ : ORCHIDEE should be used on more than 128 processors for the moment. I you increase the number of processors, you could lose time because of MPI communications and the multiple
-writring of output files.[[BR]]
 To know how the parallelization has been improved, you could read the following report [wiki:ParallelVersion here] (in french sorry!). In this report, the optimal number of
 processors was evaluated to 6 for NCC forcing files!
+====  Test : CRU-NCEP (0.5°) ====
+With Nicolas Viovy, we agree on a common protocol to compare his version and the standard one :
+ * CRU-NCEP forcing file : 0.5°, ~60000 land points
+ * 3 years stating from scratch to study the influence of the Load balance file.
+ * sechiba_hist_level = 1
+ * stomate_hist_level = 1
+ * ~20 variables written in the output !
+ * Monthly outputs
+ * Tests done on Curie
+ * REBUILD is done after the run
+SECHIBA_hist_level and STOMATE_hist_level are voluntary low, because Nicolas has about 20 variables in his output files.[[BR]]
+'''Results :'''
+ * Nicolas version :
+|| Number processors  ||  Time per processor  ||
+||  32                ||  ~20 min (evaluation)||
+||  64                || 10 min             ||
+||  128               ||     ~5 min          ||
+ * Standard version (trunk, revision 1076)  :
+|| Number processors  ||  Time per processor  ||
+||  16                ||    24 min        ||
+||  32                ||    16 min        ||
+||  48                ||    13 min        ||
+||  64                ||    11 min        ||
+||  128               ||     9 min 30     ||
+''' Conclusion  :'''
+ * The performance between Nicolas and standard version are similar until 64 processors. After, there are no more improvements in the standard version.
+ * For CRU-NCEP forcing files, the optimal number of processors is 64. Don't use more : you will use too much time computing.
+ * With the standard version, you can use the routing. It is not really possible with Nicolas version.
+ * There are still two problems to solve :
+   - Change level output for some variables : there are too many variables written by ORCHIDEE. We could set to level 1 all the essential variables necessary to performed a spin-up.
+   - Why we lose scalability when we use more than 64 processors ?
+''' ACTIONS : '''
+ * Redefined output level for ORCHIDEE variables
+ * Use Vampir to understand the behaviour of ORCHIDEE on a high number of processors.