20 | | == Step by step manual == |
21 | | === Activate the debug flags when compiling the orchidee_ol executable === |
22 | | |
23 | | The goal of this step is to include marks within the executable which let the debugger identify which line of the source code is processed at every step of the execution. Follow the explanations for compiling ORCHIDEE, XIOS and IOIPSL in debug mode at following page [wiki:Documentation/UserGuide/flags How to get starting debugging]. |
24 | | |
25 | | |
26 | | |
27 | | === Launch the executable within the debugger === |
28 | | |
29 | | Start the executable from the folder of your choice within which you have added a run.def file with all the options of your choice and information about the location of the forcing files, the restart files,... |
30 | | |
31 | | In the present example, we will run the code unto the first execution of line 278 of sechiba.f90, print the value of all the variables defined at that point, continue until the next execution of that line, print again the value of lai_max and then finish the execution of orchidee_ol. |
32 | | |
33 | | First, we copy the newly compiled executable (see above) in the execution folder. Check that all the restart and forcing files are correctly referred to in the run.def and make sure that all restart files of the previous run have been removed from the source folder. |
34 | | {{{ |
35 | | user@computer:~/my_execution_folder>cp ~/my_orchidee_install/bin/orchidee_ol . |
36 | | user@computer:~/my_execution_folder>ls -l |
37 | | orchidee_ol run.def driver_start.nc sechiba_start.nc stomate_start.nc |
38 | | }}} |
39 | | |
40 | | Then launch the executable within the environment of the debugger. |
41 | | If you are within the lab, it might be nicer to use the graphical interface environment. |
42 | | Type : |
43 | | {{{ |
44 | | user@computer:~/my_execution_folder>idb ./orchidee_ol |
45 | | Intel(R) Debugger for applications running on Intel(R) 64 |
46 | | Reading symbols from ~/my_execution_folder/orchidee_ol...done. |
47 | | }}} |
48 | | |
49 | | From outside the lab, it is more efficient to use only the command line : |
50 | | {{{ |
51 | | user@computer:~/my_execution_folder>idbc ./orchidee_ol |
52 | | }}} |
53 | | |
54 | | We set a first breakpoint (line of the code at which we want the execution to halt) and run the executable to that point. |
55 | | Here we added a condition on a variable (kjit) to stop only when the time-step kjit==5800 is reached. |
56 | | {{{ |
57 | | (idb)break sechiba.f90:278 if (kjit == 5800) |
58 | | (idb)run |
59 | | }}} |
60 | | Then we have a look at all the variable defined within the present subroutine |
61 | | {{{ |
62 | | (idb)print lai_max |
63 | | }}} |
64 | | ...continue the execution of the code step by step : |
65 | | {{{ |
66 | | (idb)step |
67 | | }}} |
68 | | |
69 | | Look at the evolution of the variable after each stop in the iteration : |
70 | | {{{ |
71 | | (idb)display lai_max |
72 | | }}} |
73 | | |
74 | | ...look at the code around the current position |
75 | | {{{ |
76 | | (idb)list |
77 | | }}} |
78 | | |
79 | | Now we can change the condition on 'kjit' for the first breakpoint (the only defined as can be checked by running info b or info break). Then, we continue the execution of the code until the new condition is matched : |
80 | | {{{ |
81 | | (idb)cond 1 (kjit==5900) |
82 | | (idb)cont |
83 | | }}} |
84 | | |
85 | | We could also simply remove this condition and continue the execution of the code until the next execution of the line 278 of sechiba.f90 : |
86 | | {{{ |
87 | | (idb)cond 1 |
88 | | (idb)cont |
89 | | }}} |
90 | | |
91 | | show the list of breakpoints |
92 | | {{{ |
93 | | (idb)info breakpoints |
94 | | }}} |
95 | | |
96 | | ...remove all breakpoints to finish the execution of the code and quit the debugger |
97 | | {{{ |
98 | | (idb)delete break |
99 | | (idb)cont |
100 | | (idb)quit |
101 | | user@computer:~/my_execution_folder> |
102 | | user@computer:~/my_execution_folder>ls -l |
103 | | orchidee_ol run.def driver_start.nc sechiba_start.nc stomate_start.nc driver_restart.nc sechiba_restart.nc stomate_restart.nc out_orchidee.txt |
104 | | }}} |
105 | | |
106 | | ...and one may start again from the beginning after having removed the restart files |
107 | | {{{ |
108 | | user@computer:~/my_execution_folder>rm -f *restart.nc |
109 | | user@computer:~/my_execution_folder>idb ./orcdidee_ol |
110 | | }}} |
111 | | |
112 | | Rather than leaving the debugger and coming again, we could have rm the restart files from inside and start again the execution from the beginning : |
113 | | {{{ |
114 | | (idb)shell rm -f *restart.nc |
115 | | (idb)run |
116 | | }}} |
117 | | |
118 | | Note also that shortcuts do exist for most of the commands (r for run, s for step, b for breakpoint,...). Using it further speeds things a little bit. |
119 | | |
120 | | |
121 | | === Other elements of syntax === |
122 | | |
123 | | 1. In line help |
124 | | * general menu |
125 | | {{{ |
126 | | (idb)help |
127 | | }}} |
128 | | * on a precise command |
129 | | {{{ |
130 | | (idb)help info |
131 | | }}} |
132 | | * or subcommand |
133 | | {{{ |
134 | | (idb)help info breakpoints |
135 | | }}} |
136 | | 2. List |
137 | | * all the available commands |
138 | | {{{ |
139 | | (idb)complete |
140 | | }}} |
141 | | * all the commands starting with a defined subcommand |
142 | | {{{ |
143 | | (idb)complete info |
144 | | }}} |
145 | | * local variables defined at different level of the call stack (for exemple in dim2_driver, which is always the outermost (frame -1) or the inner one, for instance sechiba.f90 if we have just stopped at the breakpoint defined above. |
146 | | {{{ |
147 | | (idb)bt |
148 | | (idb)frame 1 |
149 | | (idb)info locals |
150 | | (idb)frame 0 |
151 | | (idb)info locals |
152 | | }}} |
153 | | 3. Magic |
154 | | * Change the value of one variable and continue the run afterwards to check if it solve all the problems. |
155 | | {{{ |
156 | | (idb)set variable fluxsens(1)=1 |
157 | | }}} |
158 | | |
159 | | 4. Reference sheet |
160 | | * [attachment:gdb-quickref.pdf Quick reference sheet]. Also available on the net http://www.scribd.com/doc/3589/gdb-quickref |
161 | | |
162 | | |
163 | | |
164 | | = Totalview = |
165 | | |
166 | | Totalview is GUI debugger that works very will with mpi/omp binaries. You can find it installed in big HPCs such as Curie or ADA. |
167 | | |
168 | | == Curie == |
169 | | |
170 | | === SPMD === |
171 | | |
172 | | In order to run totalview you need to make available to you like this: |
173 | | {{{ |
174 | | module load totalview |
175 | | }}} |
176 | | In order to run you simulation in interactive mode so you can use the debugger interface: |
177 | | {{{ |
178 | | ccc_mprun -n 16 -p standard -A <your project id> -d tv ./orchidee_ol |
179 | | }}} |
180 | | This call ask for 16 processos in the standard queue. -d tv selects totalview as a debugger. |
181 | | |
182 | | When the startup window shows up, select "Enable memory debugging". |
183 | | |
184 | | === MPMD === |
185 | | |
186 | | The script below defines how to run an MPI-OpenMP with totalview (eg: coupled LMDZ + Orchidee) program: |
187 | | |
188 | | {{{ |
189 | | #!/bin/bash |
190 | | #MSUB -r mictgrm_test # Request name |
191 | | #MSUB -n 64 # Number of tasks to use |
192 | | #MSUB -c 2 # Number of tasks to use |
193 | | #MSUB -T 4000 # Elapsed time limit in seconds |
194 | | #MSUB -o orchid_%I.o # Standard output. %I is the job id |
195 | | #MSUB -e orchid_%I.e # Error output. %I is the job id |
196 | | #MSUB -Q normal |
197 | | #MSUB -D |
198 | | #MSUB -X |
199 | | #MSUB -A gen6328 |
200 | | #MSUB -q standard |
201 | | |
202 | | |
203 | | set -x |
204 | | module unload netcdf hdf5 |
205 | | module load netcdf/4.3.3.1_hdf5_parallel |
206 | | module load hdf5/1.8.9_parallel |
207 | | module load totalview |
208 | | |
209 | | #module load ddt |
210 | | #unset SLURM_SPANK_AUKS |
211 | | |
212 | | # enable core dump file in case of error |
213 | | ulimit -c unlimited |
214 | | |
215 | | export KMP_STACKSIZE=3g |
216 | | export KMP_LIBRARY=turnaround |
217 | | export MKL_SERIAL=YES |
218 | | OMP_NUM_THREADS=2 |
219 | | |
220 | | cat << END > pp.conf |
221 | | 4 ./xios.x |
222 | | 60 totalview ./lmdz.x |
223 | | END |
224 | | |
225 | | mpirun -tv -n 4 ./xios.x : -n 60 ./lmdz.x |
226 | | }}} |
227 | | |
228 | | In this specific case, coupled lmdz + Orchidee runs 60 MPI procs and 4 XIOS procs with 2 OpenMP threads. In total, it requires 128 procs. |
229 | | |
230 | | When the startup window shows up, select "Enable memory debugging". |
231 | | |
232 | | == Notes == |
233 | | |
234 | | Totalview is not able to debug when the binary is compiled with -p flag (only for profiling purposes). For that reason, it needs to be removed from the compilation. |
235 | | |
236 | | If you compile orchidee with the makeorchidee_fcm tool, make sure to remove it from arch.fcm file: |
237 | | |
238 | | arch.fcm: |
239 | | {{{ |
240 | | %DEBUG_FFLAGS -fpe0 -p -O0 -g -traceback -fp-stack-check -ftrapuv -check bounds -check all |
241 | | }}} |
242 | | to |
243 | | {{{ |
244 | | %DEBUG_FFLAGS -fpe0 -O0 -g -traceback -fp-stack-check -ftrapuv -check bounds -check all |
245 | | }}} |
246 | | |
247 | | Make sure this module is not loaded when compiling the source code AND running the executable. Unload by |
248 | | {{{ |
249 | | module unload gnu/4.8.1 |
250 | | }}} |
251 | | |
252 | | = Allinea DDT (Orchidee + XIOS) on Curie = |
253 | | |
254 | | DDT is a parallel debugger found in Curie Hpc. |
255 | | How to run an Orchidee Parallel simulation with XIOS in server mode. |
| 21 | DDT is a parallel debugger found in Curie Hpc. How to run an Orchidee Parallel simulation with XIOS in server mode? |