Version 1 (modified by bguenet, 4 years ago) (diff) |
---|
Problem when running in parallel with several proc on obelix
Author: B. Guenet
Last revision: 2020/06/16, B. Guenet
Question by Laura Sereni on 2020/06/11
I tried to launch a simulation over Europe with 36 procs (nodes=6:ppn=6) and it fails with the following error message:
[obelix22:73775] [[50269,0],0] usock_peer_send_blocking: send() to socket 57 failed: Broken pipe (32) [obelix22:73775] [[50269,0],0] ORTE_ERROR_LOG: Unreachable in file oob_usock_connection.c at line 316
[obelix22:73775] [[50269,0],0]-[[50269,1],3] usock_peer_accept: usock_peer_send_connect_ack failed
Answer by Fabienne Maignan
In obelix, it is not recommended to work on the tmp disk when asking for several proc. Thus, in the Job, you have to modify the RUN_DIR_PATH from
#RUN_DIR_PATH=/workdir/or/scratchdir/of/this/machine
to
RUN_DIR_PATH=/home/diskname/mylogin/RUNDIR
where diskname must be changed by the disk where you want to run (e.g. orchidee01, surface7, etc.) and mylogin is your login. You also have to create the RUNDIR directory before running.
Once the simulation is finished it is very important to clean the RUNDIR to avoid unnecessary storage of forcing files!