Changes between Version 1 and Version 2 of Documentation/UserGuide/DDTmap


Ignore:
Timestamp:
2020-03-19T17:51:40+01:00 (4 years ago)
Author:
luyssaert
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • Documentation/UserGuide/DDTmap

    v1 v2  
    1 Contributed by Pierre Brender 
    2  
    31[[PageOutline]]  
    42 
    5 = IDB = 
     3= Debugging with DDT Allinea Map = 
    64 
    7 == For which purpose a debugging tool may be of some use ? == 
     5== Objective == 
     6Background of this item: when the model does not run as expected, the calculations will need to be checked step by step. Debuggers such as IDB help to identify what are the values taken by some variables at some point, check that a subroutine of interest is actually called during the execution of the code (and how many times it happens),... This might be very useful when one would like to speed up the identification of the variable which is the first to take weird values (particularly after a significant modification of the code) and the snap of code which cause that trouble. An basic alternative to the use of debuggers is added WRITE statements to the code. 
    87 
    9 In the present page, we will try to present tools which help to identify what are the values taken by some variables at some point, check that a subroutine of interest is actually called during the execution of the code (and how many times it happens),... This might be very useful when one would like to speed up the identification of the variable which is the first to take weird values (particularly after a significant modification of the code) and the snap of code which cause that trouble.  
    10  
     8== Boundary conditions for using a debugger == 
    119Of course, getting a functional version of the code after a modification of one of the routine of ORCHIDEE continues to require a few steps and the debugger we present only helps to speed up the second one : 
    1210   1. Getting a version of the code which can be compiled. The first errors displayed by the compiler before crashing should be of some help to solve that issue. 
    13    1. Once the code including the modification can be compiled properly, it may often happen that some of the variables take aberrant values, even for runs in offline mode on one point. You are likely to be interested by this tutorial if you are used to proceed to tedious cycles of : 
     11   2. Once the code including the modification can be compiled properly, it may often happen that some of the variables take aberrant values, even for runs in offline mode on one point. You are likely to be interested by this tutorial if you are used to proceed to tedious cycles of : 
    1412      * addition within the subroutine of interest of lines such as "PRINT *, 'MY_VAR='my_var  
    1513      * compilation of the code 
    1614      * screening of the standard output of the executed code   
    17    1. Check that the introduction of the new feature doesn't lead to weird behaviour for runs at the global scales and/or coupled with the GCM. 
     15   3. Check that the introduction of the new feature doesn't lead to weird behaviour for runs at the global scales and/or coupled with the GCM. 
    1816 
     17== DDT Allinea Map on Curie ==  
     18Authors: P. Brender [[BR]]  
     19Last revision: P. Brender (2019/05/02) [[BR]] 
    1920 
    20 == Step by step manual == 
    21 === Activate the debug flags when compiling the orchidee_ol executable === 
    22  
    23 The goal of this step is to include marks within the executable which let the debugger identify which line of the source code is processed at every step of the execution. Follow the explanations for compiling ORCHIDEE, XIOS and IOIPSL in debug mode at following page [wiki:Documentation/UserGuide/flags How to get starting debugging]. 
    24  
    25  
    26  
    27 === Launch the executable within the debugger === 
    28  
    29 Start the executable from the folder of your choice within which you have added a run.def file with all the options of your choice and information about the location of the forcing files, the restart files,... 
    30  
    31 In the present example, we will run the code unto the first execution of line 278 of sechiba.f90, print the value of all the variables defined at that point, continue until the next execution of that line, print again the value of lai_max and then finish the execution of orchidee_ol. 
    32  
    33 First, we copy the newly compiled executable (see above) in the execution folder. Check that all the restart and forcing files are correctly referred to in the run.def and make sure that all restart files of the previous run have been removed from the source folder. 
    34 {{{ 
    35 user@computer:~/my_execution_folder>cp ~/my_orchidee_install/bin/orchidee_ol . 
    36 user@computer:~/my_execution_folder>ls -l 
    37 orchidee_ol   run.def driver_start.nc sechiba_start.nc stomate_start.nc 
    38 }}} 
    39  
    40 Then launch the executable within the environment of the debugger. 
    41 If you are within the lab, it might be nicer to use the graphical interface environment.  
    42 Type : 
    43 {{{ 
    44 user@computer:~/my_execution_folder>idb ./orchidee_ol 
    45 Intel(R) Debugger for applications running on Intel(R) 64 
    46 Reading symbols from ~/my_execution_folder/orchidee_ol...done. 
    47 }}} 
    48  
    49 From outside the lab, it is more efficient to use only the command line : 
    50 {{{ 
    51 user@computer:~/my_execution_folder>idbc ./orchidee_ol 
    52 }}} 
    53  
    54 We set a first breakpoint (line of the code at which we want the execution to halt) and run the executable to that point. 
    55 Here we added a condition on a variable (kjit) to stop only when the time-step kjit==5800 is reached. 
    56 {{{ 
    57 (idb)break sechiba.f90:278 if (kjit == 5800)   
    58 (idb)run 
    59 }}} 
    60 Then we have a look at all the variable defined within the present subroutine 
    61 {{{ 
    62 (idb)print lai_max 
    63 }}} 
    64 ...continue the execution of the code step by step : 
    65 {{{ 
    66 (idb)step 
    67 }}} 
    68  
    69 Look at the evolution of the variable after each stop in the iteration : 
    70 {{{ 
    71 (idb)display lai_max 
    72 }}} 
    73  
    74 ...look at the code around the current position 
    75 {{{ 
    76 (idb)list 
    77 }}} 
    78  
    79 Now we can change the condition on 'kjit' for the first breakpoint (the only defined as can be checked by running info b or info break). Then, we continue the execution of the code until the new condition is matched : 
    80 {{{ 
    81 (idb)cond 1 (kjit==5900) 
    82 (idb)cont 
    83 }}} 
    84  
    85 We could also simply remove this condition and continue the execution of the code until the next execution of the line 278 of sechiba.f90 : 
    86 {{{ 
    87 (idb)cond 1 
    88 (idb)cont 
    89 }}} 
    90  
    91 show the list of breakpoints 
    92 {{{ 
    93 (idb)info breakpoints 
    94 }}} 
    95  
    96 ...remove all breakpoints to finish the execution of the code and quit the debugger 
    97 {{{ 
    98 (idb)delete break 
    99 (idb)cont 
    100 (idb)quit 
    101 user@computer:~/my_execution_folder> 
    102 user@computer:~/my_execution_folder>ls -l 
    103 orchidee_ol   run.def driver_start.nc sechiba_start.nc stomate_start.nc driver_restart.nc sechiba_restart.nc stomate_restart.nc out_orchidee.txt 
    104 }}} 
    105  
    106 ...and one may start again from the beginning after having removed the restart files 
    107 {{{ 
    108 user@computer:~/my_execution_folder>rm -f *restart.nc 
    109 user@computer:~/my_execution_folder>idb ./orcdidee_ol 
    110 }}} 
    111  
    112 Rather than leaving the debugger and coming again, we could have rm the restart files from inside and start again the execution from the beginning : 
    113 {{{ 
    114 (idb)shell rm -f *restart.nc 
    115 (idb)run 
    116 }}} 
    117   
    118 Note also that shortcuts do exist for most of the commands (r for run, s for step, b for breakpoint,...). Using it further speeds things a little bit. 
    119  
    120  
    121 === Other elements of syntax === 
    122  
    123 1. In line help  
    124    * general menu 
    125 {{{ 
    126 (idb)help 
    127 }}} 
    128    * on a precise command 
    129 {{{ 
    130 (idb)help info 
    131 }}} 
    132    * or subcommand 
    133 {{{ 
    134 (idb)help info breakpoints 
    135 }}} 
    136 2. List  
    137    * all the available commands 
    138 {{{ 
    139 (idb)complete 
    140 }}} 
    141    * all the commands starting with a defined subcommand 
    142 {{{ 
    143 (idb)complete info 
    144 }}} 
    145    * local variables defined at different level of the call stack (for exemple in dim2_driver, which is always the outermost (frame -1) or the inner one, for instance sechiba.f90 if we have just stopped at the breakpoint defined above.  
    146 {{{ 
    147 (idb)bt 
    148 (idb)frame 1 
    149 (idb)info locals 
    150 (idb)frame 0 
    151 (idb)info locals 
    152 }}} 
    153 3. Magic 
    154    * Change the value of one variable and continue the run afterwards to check if it solve all the problems. 
    155 {{{ 
    156 (idb)set variable fluxsens(1)=1 
    157 }}} 
    158  
    159 4. Reference sheet 
    160    * [attachment:gdb-quickref.pdf Quick reference sheet]. Also available on the net http://www.scribd.com/doc/3589/gdb-quickref 
    161  
    162  
    163  
    164 = Totalview = 
    165  
    166 Totalview is GUI debugger that works very will with mpi/omp binaries. You can find it installed in big HPCs such as Curie or ADA. 
    167  
    168 == Curie == 
    169  
    170 === SPMD === 
    171  
    172 In order to run totalview you need to make available to you like this: 
    173 {{{ 
    174 module load totalview 
    175 }}} 
    176 In order to run you simulation in interactive mode so you can use the debugger interface: 
    177 {{{ 
    178 ccc_mprun -n 16 -p standard -A <your project id> -d tv ./orchidee_ol  
    179 }}} 
    180 This call ask for 16 processos in the standard queue. -d tv selects totalview as a debugger. 
    181  
    182 When the startup window shows up, select "Enable memory debugging".  
    183  
    184 === MPMD === 
    185  
    186 The script below defines how to run an MPI-OpenMP with totalview (eg: coupled LMDZ + Orchidee) program: 
    187  
    188 {{{ 
    189 #!/bin/bash 
    190 #MSUB -r mictgrm_test # Request name 
    191 #MSUB -n 64 # Number of tasks to use 
    192 #MSUB -c 2 # Number of tasks to use 
    193 #MSUB -T 4000 # Elapsed time limit in seconds 
    194 #MSUB -o orchid_%I.o # Standard output. %I is the job id 
    195 #MSUB -e orchid_%I.e # Error output. %I is the job id 
    196 #MSUB -Q normal  
    197 #MSUB -D 
    198 #MSUB -X 
    199 #MSUB -A gen6328 
    200 #MSUB -q standard 
    201  
    202  
    203 set -x 
    204 module unload netcdf hdf5 
    205 module load netcdf/4.3.3.1_hdf5_parallel 
    206 module load hdf5/1.8.9_parallel 
    207 module load totalview 
    208  
    209 #module load ddt 
    210 #unset SLURM_SPANK_AUKS 
    211  
    212 # enable core dump file in case of error 
    213 ulimit -c unlimited 
    214  
    215 export KMP_STACKSIZE=3g 
    216 export KMP_LIBRARY=turnaround 
    217 export MKL_SERIAL=YES 
    218 OMP_NUM_THREADS=2 
    219  
    220 cat << END > pp.conf 
    221 4    ./xios.x 
    222 60   totalview ./lmdz.x 
    223 END 
    224  
    225 mpirun -tv -n 4 ./xios.x : -n 60 ./lmdz.x 
    226 }}} 
    227  
    228 In this specific case, coupled lmdz + Orchidee runs 60 MPI procs and 4 XIOS procs with 2 OpenMP threads. In total, it requires 128 procs. 
    229  
    230 When the startup window shows up, select "Enable memory debugging".  
    231  
    232 == Notes == 
    233  
    234 Totalview is not able to debug when the binary is compiled with -p flag (only for profiling purposes). For that reason, it needs to be removed from the compilation.  
    235  
    236 If you compile orchidee with the makeorchidee_fcm tool, make sure to remove it from arch.fcm file: 
    237  
    238 arch.fcm:  
    239 {{{ 
    240 %DEBUG_FFLAGS        -fpe0 -p -O0 -g -traceback -fp-stack-check -ftrapuv -check bounds -check all 
    241 }}} 
    242 to 
    243 {{{ 
    244 %DEBUG_FFLAGS        -fpe0 -O0 -g -traceback -fp-stack-check -ftrapuv -check bounds -check all 
    245 }}} 
    246  
    247 Make sure this module is not loaded when compiling the source code AND running the executable. Unload by  
    248 {{{ 
    249 module unload gnu/4.8.1 
    250 }}} 
    251  
    252 = Allinea DDT (Orchidee + XIOS) on Curie = 
    253  
    254 DDT is a parallel debugger found in Curie Hpc.  
    255 How to run an Orchidee Parallel simulation with XIOS in server mode. 
     21DDT is a parallel debugger found in Curie Hpc. How to run an Orchidee Parallel simulation with XIOS in server mode? 
    25622 
    257231- Load ddt and run in interative mode 
     
    27642 
    27743 
    278 = Alternative debuggers = 
    27944 
    280 == Open source debugger == 
    281 GNU debugger, a free and open source alternative to idb : 
    282    * gdb 
    28345 
    284 A nice list of reference for using gdb and idb (in -gdb mode, the default, as explained in the syntax paragraph): 
    285  
    286 But gdb is not working too well on code compiled with ifort, use idbc in these situations. 
    28746 
    28847