Changes between Version 8 and Version 9 of Documentation/UserGuide/HangCrash


Ignore:
Timestamp:
2020-02-28T17:28:37+01:00 (4 years ago)
Author:
peylin
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • Documentation/UserGuide/HangCrash

    v8 v9  
    11= How to find where the model is hanging = 
     2 
     3Author: S. Luyssaert[[BR]] 
     4Last check: 2020/02/28, P. Peylin  
     5 
     6[[PageOutline]] 
     7 
     8== Objectives == 
     9 
     10This page provides some information on how to find if your model run is hanging somewhere or if it is still properly running (given that you have not obtained the final outputs that you expect). 
     11 
     12'''Context:'''  
     13 
    214You launch the model in a parallel run and you know from previous runs that the run should take, say 600 seconds. After 1200 seconds the model is still running. That looks suspicious! A likely cause of this problem is that one processor is hanging and thus preventing the model to properly finish or to properly crash. Here is some advice: 
    315 
    416=== Check whether the model really hangs === 
    5 Open the Script_Output file and search for RUN_DIR. You should find a path that looks like /ccc/scratch/cont003/dsm/p529grat/RUN_DIR/XXX/XXX. This is where the model is actually running. If you are working on irene/jean-zay or ciclad you can simply go to that folder and check when the most recent changes were made and to which files. The time of the last changes should give you an indication of whether the model really hangs or whether you are just too impatient. If, however, you are working on OBELIX, the run directory is on the /scratch but the folder where the model is running is not accessible. Open the Job you want to run and search for RUN_DIR_PATH. The instruction will be commented out. This is a good place to specify the run directory you want to use, e.g., RUN_DIR_PATH=/scratch01/sluys/RUN_DIR. Delete the job that was hanging, launch it again and have a look in /scratch01/sluys/RUN_DIR. Details can be found at https://forge.ipsl.jussieu.fr/igcmg_doc/wiki/DocEsetup#Mainjobofthesimulation 
     17Open the Script_Output file and search for RUN_DIR. You should find a path that looks like /ccc/scratch/cont003/dsm/p529grat/RUN_DIR/XXX/XXX. This is where the model is actually running. If you are working on irene/jean-zay or ciclad you can simply go to that folder and check when the most recent changes were made and to which files. The time of the last changes should give you an indication of whether the model really hangs or whether you are just too impatient. If, however, you are working on OBELIX, the run directory is on the /scratch but the folder where the model is running is not accessible. Open the Job you want to run and search for RUN_DIR_PATH. The instruction will be commented out. This is a good place to specify the run directory you want to use, e.g., RUN_DIR_PATH=/scratch01/sluys/RUN_DIR. Delete the job that was hanging, launch it again and have a look in /scratch01/sluys/RUN_DIR. Details can be found at https://forge.ipsl.jussieu.fr/igcmg_doc/wiki/Doc/Setup  
    618 
    719=== Allow the model to properly crash ===