Changes between Initial Version and Version 1 of Documentation/UserGuide/parallel_obelix


Ignore:
Timestamp:
2020-06-16T10:11:02+02:00 (4 years ago)
Author:
bguenet
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • Documentation/UserGuide/parallel_obelix

    v1 v1  
     1= Problem when running in parallel with several proc on obelix 
     2 
     3Author: B. Guenet \\ 
     4Last revision: 2020/06/16, B. Guenet  
     5 
     6== Question by Laura Sereni on 2020/06/11  
     7 
     8I tried to launch a simulation over Europe with 36 procs (nodes=6:ppn=6) and it fails with the following error message: 
     9 
     10[obelix22:73775] [[50269,0],0] usock_peer_send_blocking: send() to 
     11socket 57 failed: Broken pipe (32) 
     12[obelix22:73775] [[50269,0],0] ORTE_ERROR_LOG: Unreachable in file 
     13oob_usock_connection.c at line 316 
     14 
     15[obelix22:73775] [[50269,0],0]-[[50269,1],3] usock_peer_accept: 
     16usock_peer_send_connect_ack failed 
     17 
     18 
     19== Answer by Fabienne Maignan 
     20 
     21In obelix, it is not recommended to work on the tmp disk when asking for several proc. Thus, in the Job, you have to modify the RUN_DIR_PATH from 
     22 
     23#RUN_DIR_PATH=/workdir/or/scratchdir/of/this/machine 
     24 
     25to 
     26 
     27RUN_DIR_PATH=/home/diskname/mylogin/RUNDIR 
     28 
     29where diskname must be changed by the disk where you want to run (e.g. orchidee01, surface7, etc.) and mylogin is your login. You also have to create the RUNDIR directory before running. 
     30 
     31''' Once the simulation is finished it is very important to clean the RUNDIR to avoid unnecessary storage of forcing files! 
     32 
     33 
     34