Opened 12 days ago

Last modified 12 days ago

#202 new defect

Reduce the volume of client and server error and output files

Reported by: hshepherd Owned by: hshepherd
Priority: major Component: XIOS
Version: 2.0 Keywords:
Cc:

Description (last modified by hshepherd)

XIOS produces two text output files for each MPI rank of client and server (the .out and .err files). With Lustre file systems large numbers of small files can cause problems with performance. However these output files are the only way currently to determine performance statistics.

We want to reduce the number of output files, but still get useful performance statistics, (averages, min, max, std dev etc) to gain insight onto what is going on with the IO server.

Development
source:XIOS2/dev/hshepherd/reduce_output_log

Change History (3)

comment:1 Changed 12 days ago by hshepherd

  • Description modified (diff)

comment:2 Changed 12 days ago by hshepherd

Overview

The output is reduced to two (or three) pairs of .out and .err files:

  • First client rank
  • First server rank
  • First rank of level two server pool

This new functionality can be selected by adding the following parameter to the xios context of the iodef.xml file

<variable id="reduce_logging" type="bool">true</variable>

If this variable is unset, or set to false then behavior remains as normal.

Timing output

Whilst we want to reduce the number of output files, we would like the timing statistics to remain, as these are important for investigations into file system and model configuration performance.

We report a summary of the waiting times and ratios for the clients, and the processing time and percentage ratio for the servers, and calculate the min/max/mean/stddev. Data is collated via MPI and is possible because each client pool and server pool has it's own intraComm communicator.

  • For the clients the statistics are presented in the lowest ranked client's .out file.
  • For a single level of servers we gather statistics for all servers and write them on the output file for server rank 0.
  • For two levels for servers we gather statistics for the level 1 servers, and each pool of level two servers individually, and these get written to server rank 0 (for level 1), and to the lowest ranked server in each pool.

Caveat for the client: The ratio might be slightly skewed. The full XIOS init/Finalize time presented in the output includes the MPI finalize command for communicator 'intraComm', which we can't include as we must use MPI to gather the times from all ranks.

comment:3 Changed 12 days ago by hshepherd

  • Description modified (diff)
Note: See TracTickets for help on using tickets.