= Performance = == Basic Performance Report == === Overview === This document tries to understand Orchidee MICT computing time behavior. In the latest version 6.5 it takes a lot of time to compute. Around 8h in 0.5 degrees for 1 year. So it is necessary to understand why It happens. Once the issues are identified it might be possible to apply different solutions. In order to make such thing possible the code is profiled. Different tools are used (vtune, vampir, gprof, ...). They provide an easy way to identify basic hotspots in the code. === Report === attachment:performance_mict_albert_jornet_150616.pdf [[PageOutline]] == MICT V6 (3344 + PFT interpolation) Module computing time == It is used the same configuration but at each try, a new module is activated. Its purpose should show the impact of each module when computing. [[Image(test_perf_matguimb.png​, 80%)]] == Mict R3359 (allinea map) == This profile is done with Allinea Map under Curie Machine, 16 cores MPI, 1 Month and 0.5 degree with IOIPSL. All components are compiled in Production mode (fast). {{{ ########## ########## ########## ########## ########## ########## ########## ########## Execution Sum Up ########## ########## ########## ########## ########## ########## ########## ########## Jobid : 4569560 Jobname : M65_test User : p529jorn Account : gen6328@standard Limits : time = 4:10:00 , memory/task = 4000 Mo Date : submit = 15/04/2016 09:42:37 , start = 15/04/2016 09:52:13 Execution : partition = standard , QoS = normal , Comment = (null) Resources : ntasks = 16 , cpus/task = 1 , ncpus = 16 , nodes = 1 Nodes=curie4179 CPU_IDs=0-15 Mem=64000 Memory / step -------------- Resident Size (Mo) Virtual Size (Go) JobID Max (Node:Task) AveTask Max (Node:Task) AveTask ----------- ------------------------ ------- -------------------------- ------- Accounting / step ------------------ JobID JobName Ntasks Ncpus Nnodes Layout Elapsed Ratio CPusage Eff State ------------ ------------ ------ ----- ------ ------- ------- ----- ------- --- ----- 4569560 M65_test - 16 1 - 00:55:53 100 - - - ########## ########## ########## ########## ########## ########## ########## ########## }}} Screenshots: Main: [[Image(main.png, 20%, title="main")]] MPI: [[Image(mpi_mict_map.png, 20%, title="main")]] Memory: [[Image(mem_mict_map.png​, 20%, title="main")]] IO: [[Image(io_mict_map.png, 20%, title="main")]] CPU Time: [[Image(cputime_mict_map.png​, 20%, title="main")]] CPI: [[Image(cpi_mict_map.png​, 20%, title="main")]] Click the link below to download the profiling file: attachment:orchidee_ol_16p_1t_2016-04-15_09-52.map == Trunk vs MICT Comparision 11/04/2016 == * Date 11/04/2016 * ADA Machine * IOIPSL production mode * Orchidee production mode * 1Y * 16 cores * Forcing: * 1 Degree * 3H Considerations: * MICT is in the same level of modifications as Trunk revision 3346 * MICT is using '''parallel interpolation''' for aggregate 2D subroutine === Overview === [[Image(trunk_vs_mict_grouped.png​, 20%, title="main")]] Subroutines are placed in 4 different groups described below: * ioipsl: all subroutines related to IOIPSL library * Top orchidee: subroutines >1% of computing time * Interpolation: interpolation time by aggregate_2D subroutine * other orchidee: remaining subroutines from orchidee === Mict R3359 (gprof) === This is a profiling test done with gprof tool: {{{ Flat profile: Each sample counts as 0.01 seconds. % cumulative self self total time seconds seconds calls Ks/call Ks/call name 25.66 1383.92 1383.92 2245127 0.00 0.00 mathelp_mp_ma_fuscat_r21_ 9.62 1902.84 518.92 3835809 0.00 0.00 mathelp_mp_moycum_index_ 9.18 2398.02 495.18 3835826 0.00 0.00 histcom_mp_histwrite_real_ 5.96 2719.41 321.39 17524 0.00 0.00 thermosoil_mp_thermosoil_cond_pft_ 3.81 2924.90 205.49 17520 0.00 0.00 hydrol_mp_hydrol_soil_ 3.62 3119.87 194.97 420480 0.00 0.00 hydrol_mp_hydrol_soil_coef_ 3.59 3313.39 193.52 17524 0.00 0.00 thermosoil_mp_thermosoil_getdiff_ 3.11 3481.04 167.65 365 0.00 0.00 stomate_wet_ch4_pt_ter_wet2_mp_ch4_wet_flux_density_wet2_ 3.05 3645.33 164.29 365 0.00 0.00 stomate_wet_ch4_pt_ter_wet1_mp_ch4_wet_flux_density_wet1_ 2.92 3803.03 157.70 365 0.00 0.00 stomate_wet_ch4_pt_ter_wet3_mp_ch4_wet_flux_density_wet3_ 2.86 3957.34 154.31 365 0.00 0.00 stomate_wet_ch4_pt_ter_0_mp_ch4_wet_flux_density_0_ 2.74 4105.24 147.90 365 0.00 0.00 stomate_wet_ch4_pt_ter_wet4_mp_ch4_wet_flux_density_wet4_ 2.67 4249.50 144.26 17522 0.00 0.00 thermosoil_mp_thermosoil_coef_ 1.63 4337.37 87.87 17520 0.00 0.00 hydrol_mp_hydrol_diag_soil_ 1.59 4423.39 86.02 2666157 0.00 0.00 mod_orchidee_omp_transfert_mp_gather_omp_r1_ 1.57 4507.82 84.43 55 0.00 0.00 interpol_help_mp_aggregate_2d_ 1.37 4581.90 74.08 17520 0.00 0.00 diffuco_mp_diffuco_trans_co2_ 1.36 4655.06 73.16 17520 0.00 0.00 stomate_mp_stomate_main_ 1.22 4720.59 65.53 17520 0.00 0.00 stomate_permafrost_soilcarbon_mp_microactem_ 1.06 4777.86 57.27 17520 0.00 0.00 hydrol_mp_hydrol_main_ 0.96 4829.85 51.99 1602027 0.00 0.00 mathelp_mp_ma_fuscat_r11_ 0.77 4871.20 41.35 17522 0.00 0.00 thermosoil_mp_thermosoil_readjust_ 0.74 4911.35 40.15 2664512 0.00 0.00 mod_orchidee_omp_transfert_mp_gather_omp_i1_ }}} Total Simulation time: 5358 seconds IO: mathelp + histcom = 25.66 + 9.62 + 9.18 = ~45% === Trunk R3346 === This is a profiling test done with gprof tool: {{{ Flat profile: Each sample counts as 0.01 seconds. % cumulative self self total time seconds seconds calls Ks/call Ks/call name 22.26 441.54 441.54 7 0.06 0.06 interpol_help_mp_aggregate_2d_ 14.52 729.66 288.12 2171415 0.00 0.00 histcom_mp_histwrite_real_ 13.26 992.66 263.00 17520 0.00 0.00 hydrol_mp_hydrol_soil_ 10.28 1196.56 203.90 773813 0.00 0.00 mathelp_mp_ma_fuscat_r21_ 5.07 1297.17 100.61 2171397 0.00 0.00 mathelp_mp_moycum_index_ 4.16 1379.77 82.60 17520 0.00 0.00 diffuco_mp_diffuco_trans_co2_ 3.81 1455.34 75.57 17520 0.00 0.00 hydrol_mp_hydrol_diag_soil_ 3.67 1528.21 72.87 157680 0.00 0.00 hydrol_mp_hydrol_soil_coef_ 2.29 1573.69 45.48 1400412 0.00 0.00 mathelp_mp_ma_fuscat_r11_ 2.27 1618.76 45.07 17520 0.00 0.00 hydrol_mp_hydrol_main_ 1.86 1655.66 36.90 17521 0.00 0.00 thermosoil_mp_thermosoil_getdiff_ 1.46 1684.63 28.97 17521 0.00 0.00 thermosoil_mp_thermosoil_humlev_ 0.99 1704.17 19.54 157680 0.00 0.00 hydrol_mp_hydrol_soil_tridiag_ 0.94 1722.82 18.65 17520 0.00 0.00 stomate_litter_mp_littercalc_ 0.92 1740.99 18.17 17520 0.00 0.00 hydrol_mp_hydrol_split_soil_ 0.86 1758.10 17.11 17520 0.00 0.00 stomate_mp_stomate_main_ 0.81 1774.07 15.98 1133588 0.00 0.00 mod_orchidee_omp_transfert_mp_gather_omp_r1_ }}} Total Simulation time: 1956 seconds IO: mathelp + histcom = 14.25 + 10.28 + 5.07 = ~30% == Trunk vs MICT Comparision 18/02/2016 == 18/02/2016: revisions trunk 2916 and MICT 3161 were considered to be equivalents. The same run.def file is used to compare both developments. The simulations were carried out under the following conditions: * 1 Year * Global * CRU-NCEP v5.3.2 (6 hourly) * CURIE * IO library: IOIPSL * Compilation mode IOIPSL: production * Compilation mode Orchidee: production === Mict R3527 === Time table: ||= N procs =||= 8 =||= 16 =||= 32 =||= 64 =||= 128 =||= 256 =|| ||= 0.5 deg =|| >16h39 322 days|| 11h09 || 7h06 || 4h50 || 3h31 || 2h47 || ||= 1 deg =|| 4h10 || 2h14 || 1h20 || 52 || 37 || 30 || ||= 2 deg =|| - || - || - || - || - || - || === Mict R3161 === Time table: ||= N procs =||= 4 =||= 8 =||= 16 =||= 32 =||= 64 =||= 128 =|| ||= 0.5 deg =|| Memory error || >16h39 322 days|| 13h00 || 8h46 || 6h35 || 5h38 || ||= 1 deg =|| 6h37 || 4h20 || 2h36 || 1h45 || 1h21 || 1h08 || ||= 2 deg =|| 1h40 || 56 || 35 || 24 || 19 || 16 || Note: 0.5 deg in 4 N procs did not start due to memory requirements. 0.5 deg in 8 N procs could not finish the simulation in the maximum time given by the HPC. It stopped at the simulation day 322. Both values can be extrapolated. === Trunk R2916 === The same simulations with the same options where carried out with the following results: ||= N procs =||= 4 =||= 8 =||= 16 =||= 32 =||= 64 =||= 128 =|| ||0.5 deg || 8h38 || 5h31 || 3h26 || 2h23 || 1h48 || 1h31 || ||1 deg || 2h07 || 1h17 || 47 || 32 || 25 || 21 || ||2 deg || 38 || 19 || 11 || 8 || 6 || 5 ||