Version 17 (modified by luyssaert, 4 years ago) (diff) |
---|
How to check whether two (netcdf) files are identical
Author: S.Luyssaert and A.S. Lansø
Last revised: 2020/02/28, P. Maugis
Added diffnc: 2020/12/23, S. Luyssaert
diffnc
Rather than comparing plots, it's faster and more precise to compare whether two netcdf files (i.e. a history or restart file between 2 model versions) are numerically identical. cdo diffv and nccmp (Matlab) seem to rely on outdated HDF5 libraries. Another tool that worked well was diffc. Just download the diffnc script, and run it.
curl -L https://github.com/spinto/diffnc/raw/master/diffnc -o diffnc chmod +x diffnc ./diffc -h
See https://github.com/spinto/diffnc for details.
cdo diffv
If available (i.e. on obelix), you can use the following command:
cdo diffv path_file_1 path_file_2
The comparison only works if the files contain the same variables and in the same order. Otherwise the cdo command will return error=100. If one of the files contains more variables, the attached script differr100.sh by Josefine Ghattas can be used. This script will check if there are differences in the variable names and ask the user to remove variables, so the command "cdo diffv" can be applied. It first checks variables with type float and if no differences are found, it checks variables with type double. The script can therefore be used as well on diagnostic history files as restart files from ORCHIDEE. Use the script in the following way:
./diff100err.sh path_file_1 path_file_2
5-dimension variables are ignored by the cdo diffv command, thus not all variables in the restart files can be compared by this method.
ADVANTAGE: the output file tells you which fields are different. Be aware, though that this method works best for smaller netCDF files. If your history file is more than a few megabytes, the output text file may be many hundreds of megabytes. In that case, the md5sum command may be a better option.
DISADVANTAGE: only works for netcdf files, and for tables rank lower than 4.
md5sum
If you expect the files to be identical (bit by bit), you can use
#!/bin/bash md5sum path_file1 > sum1 md5sum path_file2 > sum2 cmp -s sum1 sum2
The two first commands create signature strings for each files, written in files 'sum1' and 'sum2' (which will thus be created/overwritten). The output of the third line will be 0 if files are identical, 1 otherwise.
ADVANTAGE: works for all files.
DISADVANTAGE: you only know whether the files are identical or not. If not, you have no idea which fields are different.
Matlab
The matlab function nccmp is able to compare all variables contained within two netcdf files. The original version can be found here. Pascal Maugis has made some small modifications so that the information produced by the script are put into a file instead of being printed to the screen. The updated version can be found here
Sadly, matlab is not on obelix, but on IRENE. To open matlab on IRENE type Matlab or if you wish to run from the terminal type matlab -nodesktop.
Next run the function by typing:
NCCMP(ncfile1, ncfile2, tolerance, forceCompare)
tolerance is if you allow some variation in the variables between the two files. We want identical files thus put [] here.
forceCompare can be set to True or False.
- True - write all occurrences of differences in a variable (specifically gives all the indices) to the file: all_diff.txt.
- False - only write, when there are differences in a variable, the first occurrence of such differences to the file 'first_diff.txt'.
For global simulations, the True option can produce a large file and the information might be hard to process, if there are many differences between the compared restart files. In addition, the True option makes the script much slower. However, for small simulation the True option is very useful.
It is recommended to use the re-ordered files from the difffer100.sh script as inputs to nccmp.
Attachments (2)
- differr100.sh (1.5 KB) - added by pmaugis 5 years ago.
- nccmp_obelix.m (13.9 KB) - added by pmaugis 5 years ago.
Download all attachments as: .zip