wiki:Documentation/UserGuide/DifferencesNetcdf

Version 12 (modified by pmaugis, 4 years ago) (diff)

--

How to check whether two (netcdf) files are identical

Author: S.Luyssaert and A.S. Lansø

Last revised: 2020/02/28, P. Maugis

cdo diffv

Rather than comparing plots, it's faster and more precise to compare whether two netcdf files (i.e. a history or restart file between 2 model versions) are numerically identical. If available (i.e. on obelix), you can use the following command:

cdo diffv   path_file_1   path_file_2 > output_file_name.txt

The attached script differ100.sh by Josefine Ghattas does that also and nicely.

The comparison is easier if the same variables are contained in the two netcdf files in the same order. However 5dim variables are ignored by the cdo diffv command, thus not all variables in the restart files can be compared by this method

ADVANTAGE: the output file tells you which fields are different. Be aware, though that this method works best for smaller netCDF files. If your history file is more than a few megabytes, the output text file may be many hundreds of megabytes. In that case, the md5sum command may be a better option.

DISADVANTAGE: only works for netcdf files, and for tables rank lower than 4.

md5sum

If you expect the files to be identical (bit by bit), you can use

#!/bin/bash
md5sum path_file1 > sum1
md5sum path_file2 > sum2
cmp -s sum1 sum2

The two first commands create signature strings for each files, written in files 'sum1' and 'sum2' (which will thus be created/overwritten). The output of the third line will be 0 if files are identical, 1 otherwise.

ADVANTAGE: works for all files.

DISADVANTAGE: you only know whether the files are identical or not. If not, you have no idea which fields are different.

Matlab

The matlab function nccmp are able to compare all variables contained within two netcdf files. The original version can be found here. I have made some small modifications so that the information produced by the script are put into a file instead of being printed to the screen. The updated version can be found here

Sadly, matlab is not on obelix, but on IRENE. To open matlab on IRENE type Matlab or if you wish to run from the terminal type matlab -nodesktop.

Next run the function by typing:

NCCMP(ncfile1, ncfile2, tolerance, forceCompare)

tolerance is if you allow some variation in the variables between the two files. We want identical files thus put [] here.

forceCompare can be set to True or False.

  • True - write all occurrences of differences in a variable (specifically gives all the indices) to the file: all_diff.txt.
  • False - only write, when there are differences in a variable, the first occurrence of such differences to the file 'first_diff.txt'.

For global simulations, the True option can produce a large file and the information might be hard to process, if there are many differences between the compared restart files. In addition, the True option makes the script much slower. However, for small simulation the True option is very useful.

I recommend that you use the re-ordered files from the difffer100.sh script as inputs to nccmp.

Attachments (2)

Download all attachments as: .zip