Opened 12 days ago
Last modified 12 days ago
#202 new defect
Reduce the volume of client and server error and output files
Reported by: | hshepherd | Owned by: | hshepherd |
---|---|---|---|
Priority: | major | Component: | XIOS |
Version: | 2.0 | Keywords: | |
Cc: |
Description (last modified by hshepherd)
XIOS produces two text output files for each MPI rank of client and server (the .out and .err files). With Lustre file systems large numbers of small files can cause problems with performance. However these output files are the only way currently to determine performance statistics.
We want to reduce the number of output files, but still get useful performance statistics, (averages, min, max, std dev etc) to gain insight onto what is going on with the IO server.
Development
source:XIOS2/dev/hshepherd/reduce_output_log
Change History (3)
comment:1 Changed 12 days ago by hshepherd
- Description modified (diff)
comment:2 Changed 12 days ago by hshepherd
comment:3 Changed 12 days ago by hshepherd
- Description modified (diff)
Note: See
TracTickets for help on using
tickets.
Overview
The output is reduced to two (or three) pairs of .out and .err files:
This new functionality can be selected by adding the following parameter to the xios context of the iodef.xml file
If this variable is unset, or set to false then behavior remains as normal.
Timing output
Whilst we want to reduce the number of output files, we would like the timing statistics to remain, as these are important for investigations into file system and model configuration performance.
We report a summary of the waiting times and ratios for the clients, and the processing time and percentage ratio for the servers, and calculate the min/max/mean/stddev. Data is collated via MPI and is possible because each client pool and server pool has it's own intraComm communicator.
Caveat for the client: The ratio might be slightly skewed. The full XIOS init/Finalize time presented in the output includes the MPI finalize command for communicator 'intraComm', which we can't include as we must use MPI to gather the times from all ranks.