wiki:Documentation/UserGuide/flags

How to get started debugging and compile with debug options

Author: M. McGrath
Last revision: D. Goll 2020/04/20

You are running ORCHIDEE, just like every other day, when it stops for no apparent reason. You don't have output files from the simulation, and the run.card lists "Fatal". What can you do?

First of all, you should check your problem is not related libIGCM, like missing input file, or wall clock limit (if your job exceeded the allocated computing time). More info on https://forge.ipsl.jussieu.fr/igcmg_doc/wiki/Doc/CheckDebug

Compile in debug mode

In order to pin down exactly what and where the problem is, you can recompile ORCHIDEE with debug flags. These flags enable extra checks on code execution to identify unwanted behavior.

On the page More about compile methods, it is described how to compile using debug options. In short, for newer configurations (such as ORCHIDEE_3 or LMDZOR_v6.2 and newer), compilation is done by a script compile_X.sh and adding argument -debug activates the debug options.

Why aren't these checks enabled by default? Speed. For example, let's say we have a one-dimensional array with ten elements: ARRAY(1:10). Asking the question, "Are we correctly accessing this array?" is not too difficult to ask (checking to see if the element number we are trying to access is between 1 and 10), and Fortran is actually ahead of other languages here in requiring that each element of the array be within bounds (See Section 6.5.3 of the Fortran 2008 standard, for example: "The value of a subscript in an array element shall be within the bounds for its dimension."). Fortran should die with a segmentation fault if you have an array A(1:10,1:5) and you try to access A(11,1), while other languages will accept it because the offset of that address is still within the memory allocated for the array. In general, checks like this take time, in particular if you want to know exactly which line number is causing the crash. So you should not be surprised if execution is much slower in debug mode, possibly by a factor 10!

Therefore, the first step is often to define conditions that reproduce your crash in the shortest CPU time possible, by reducing the spatial domain and using restart files to start the simulation the day before the crash. It also allows you to use high frequency output, to look at what happens before the problem, and to reduce the number of processors, as it makes everything easier.

To achieve this, adapt your config.card file following http://forge.ipsl.jussieu.fr/igcmg_doc/wiki/Doc/Setup#config.card and http://forge.ipsl.jussieu.fr/igcmg_doc/wiki/Doc/Setup#Setupinitialstateforthesimulation

When your proble is solved, don't forget to remove the debug argument or your run will take forever!

Other useful tips for debugging

  • If the problem is suspected to be related to ORCHIDEE, the PRINTLEV flags which controls the amount of diagnostic text output from ORCHIDEE can be very useful in finding out which routine the code is crashing in. If you can't get a line number any other way, turn up the PRINTLEV as high as possible, check to see the last line printed out, and then check to see the next line which should be printed out. The crash must be happening somewhere between those two lines.
  • If the problem is suspected to be related to ORCHIDEE reading in .nc files, you can change l_dbg = .TRUE. in errioipsl.f90 to get more information.
  • If the problem is suspected to be related to XIOS, for example writing of output variables, you can make the following changes to iodef.xml to get more information printed out from XIOS.
      <variable id="info_level"                type="int">100</variable>
      <variable id="print_file" type="bool">true</variable>
    
  • And if you are up against a bug that seems to change every single time you run the code, even with all the above flags on, you might want to check out Valgrind.
Last modified 4 years ago Last modified on 2020-05-29T16:10:21+02:00