wiki:Documentation/UserGuide/DebugCoupled

Version 4 (modified by jgipsl, 10 years ago) (diff)

--

If you've never really used LMD before, coupling is a scary event, even if you know ORCHIDEE quite well. Thankfully, there are some tips to make things a bit easier.

Tip 1) If the job crashes immediately (i.e., within a day), it's good to check the coupling between the two models, which variables are passed and what their values are. This can be found in intersurf_gathered, in src_sechiba/intersurf.f90. If you changed the flag check_INPUTS in this file to .TRUE. and recompile, it will create a whole bunch of useful files in the run directory when run, of the format W*.nc (Walb_nir.nc, Wfluxlat.nc, etc., one for all the variables passed to ORCHIDEE and passed from ORCHIDEE). You can look at these with the standard NetCDF tools for each step to see if the values are unusual. In order to determine what is "usual", I do a one day run on the same resolution with the trunk and all the options I'm interested in (11-layer hydrology, STOMATE, new physics, etc.) and enable the check_INPUTS flag. This gives me an identical set of files to compare to.

Question: where is the run directory? It is not the submit directory (config/, where the job is submitted from), and it is not the archive directory (IGCM_OUT/, where the output files are stored). It is a temporary directory where all the files are copied to and execution takes place. If you look in the Script_* output file in the submit directory of a job, you can find a line which looks something like this (on Curie)

IGCM_sys_Cd : /ccc/scratch/cont003/dsm/p529grat/RUN_DIR/1213660_24665/DOFOCO.24665

This is the run directory. It is deleted if the run completes successfully, but not deleted if a crash happens. You can also find this directory based on the job submission ID.

$SCRATCHDIR/RUN_DIR/number_id_job_......

number_id_job is the number given by the system when you submit the job

For example,

[p86cozic@curie70 loig.default]$ ccc_msub Job_loig.default
Submitted Batch Session 1213759

This RUN_DIR will be : $SCRATCHDIR/RUN_DIR/1213759_*

The origin of the second set of numbers (the *) is unclear, but the job ID is unique, so there should only be one directory which starts with that number.

Tip 2) Turn on debug flags for ORCHIDEE and LMDZ

For ORCHIDEE, there are a couple options which involve changing util/AA_make.gdef. On Curie, I generally make my own, but there is a line which already exists in the svn version of the file. All you need to do is change

#-Q- curie  F_O = -DCPP_PARA -O3 $(F_D) $(F_P) -I$(MODDIR) -module $(MODDIR) -fp-model precise
######-Q- curie  F_O = -DCPP_PARA -p -g -traceback -fp-stack-check -ftrapuv -check bounds $(F_D) $(F_P) -I$(MODDIR) -module $(MODDIR)

to

######-Q- curie  F_O = -DCPP_PARA -O3 $(F_D) $(F_P) -I$(MODDIR) -module $(MODDIR) -fp-model precise
#-Q- curie  F_O = -DCPP_PARA -p -g -traceback -fp-stack-check -ftrapuv -check bounds $(F_D) $(F_P) -I$(MODDIR) -module $(MODDIR)

To take into acount these changes, you have to rerun the script util/ins_make

For LMDZ, you can change the main Makefile directly in config/LMDZOR_v5.2/. Add -debug to the following lines

lmdz: ../../modeles/LMDZ
    (cd ../../modeles/LMDZ; ./makelmdz_fcm -d $(RESOL_LMDZ) -cosp true -v true -parallel mpi -arch $(FCM_ARCH) ce0l ; cp bin/ce0l_$(RESOL_LMDZ)_phylmd_para_orch.e ../../bin/create_etat0_limit.e ; )
    (cd ../../modeles/LMDZ; ./makelmdz_fcm -d $(RESOL_LMDZ) -cosp true -v true -parallel mpi -arch $(FCM_ARCH) gcm ; cp bin/gcm_$(RESOL_LMDZ)_phylmd_para_orch.e ../../bin/gcm.e ; )

to have

lmdz: ../../modeles/LMDZ
#    (cd ../../modeles/LMDZ; ./makelmdz_fcm -d $(RESOL_LMDZ) -cosp true -v true -parallel mpi -arch $(FCM_ARCH) ce0l ; cp bin/ce0l_$(RESOL_LMDZ)_phylmd_para_orch.e ../../bin/create_etat0_limit.e ; )
    (cd ../../modeles/LMDZ; ./makelmdz_fcm -d $(RESOL_LMDZ) -cosp true -v true -debug -parallel mpi -arch $(FCM_ARCH) gcm ; cp bin/gcm_$(RESOL_LMDZ)_phylmd_para_orch.e ../../bin/gcm.e ; )

and then type "gmake clean && gmake. You'll now recompile ORCHIDEE as well as LMDZ with debug options.

You have to be careful with this since if you run ins_make again, you will lose the addition of the "-debug" flag in the Makefile and you'll have to re-add it. If you want to keep -debug as a permanent option, add it in AA_make instead.

The actual debug flags appear to be stored in modeles/LMDZ/arch/arch-X64_CURIE.fcm. Also note that if you forget to comment out the first line, every time you compile it will take 30 minutes or so, since LMDZ will compile multiple times due to the differences in flags!