wiki:SourceCode/Performances/Bench_May2016

Version 5 (modified by jpolcher, 8 years ago) (diff)

--

Benchmark of ORCHIDEE using 3 different drivers for off-line simulations

Configurations

These benchmarks were done over two domains in order to sample differently sized problems. ORCHIDEE is always configured in the same way and the same executable us used in both sets of bechmarks.

Euro-Mediterranean Global
Length of experiment 1 year 1 year
nb lon. x nb Lat. 164 x 168 720 x 360
Number of land points 15909 94742
Size of history file 41M 225M

The version of the model used is the last version of ORCHIDEE-DRIVER before the merge (May 2016)

The IO was configured to minimise the output but still using IOIPSL. The size of the output files are given in the table above.

The routing is activated. It is relevant here as it generates some MPI exchanges.

Computer and compiler details

A dedicated node of Climserv was used (merlin5). This the simulations were not in competition for memory access with other applications.

The model was compiled with PGF 2013 and OpenMPI 1.6.5. The following modules were loaded for the execution :

  • module load pgi/2013
  • module load openmpi/1.6.5-pgf2013
  • module load netcdf4/4.3.3.1-pgf2013
  • module load oasis3-mct-nc43/2.0-pgf2013

Results

Real Time

The real time (As well as user and system times.) was measured with the time command which encapsulated the mpirun used to run the model.

The graphic only shows the impact of parallel processing of the model and already hints to the fact that the fastest driver for the Euro-Mediterranean region is the old one. The OASIS coupling between the driver and ORCHIDEE is slower than the subroutine call used for the new and old drivers.

User Time

The results for the real time is confirmed and better illustrated by the user time returned by UNIX's time command.

Speed-up

The speed-up was computed relative to the 2 processor case as this is the only common reference case. The OASIS-driver cannot be used with only 1 CPU.

It shows that the better speed-up is achieved by the new driver. To understand one has to know that the new driver will read the forcing on one processor and then scatters the information to the other processors using MPI. In the old driver all processors read the netCDF file in order to obtain the data needed.

Two indices show that this can explain the better speed-up for the new driver.

The above figure displays the CPU time taken by MPI. It is clear that as the number of CPUs increases in the new driver more time is spent in the data transfer. I this graphic the OASIS driver does not count MPI exchanges needed for the forcing data as it is part of OASIS and not ORCHIDEE.

Another hint to the slow down by multiple processors accessing the same netCDF file for the forcing can be seen in the system time returned by UNIX's time command.

As we increase the number of CPUs in the old driver, the system time increases while in the new driver this time remains flat.

Attachments (5)

Download all attachments as: .zip