| 1 | = Problem when running in parallel with several proc on obelix |
| 2 | |
| 3 | Author: B. Guenet \\ |
| 4 | Last revision: 2020/06/16, B. Guenet |
| 5 | |
| 6 | == Question by Laura Sereni on 2020/06/11 |
| 7 | |
| 8 | I tried to launch a simulation over Europe with 36 procs (nodes=6:ppn=6) and it fails with the following error message: |
| 9 | |
| 10 | [obelix22:73775] [[50269,0],0] usock_peer_send_blocking: send() to |
| 11 | socket 57 failed: Broken pipe (32) |
| 12 | [obelix22:73775] [[50269,0],0] ORTE_ERROR_LOG: Unreachable in file |
| 13 | oob_usock_connection.c at line 316 |
| 14 | |
| 15 | [obelix22:73775] [[50269,0],0]-[[50269,1],3] usock_peer_accept: |
| 16 | usock_peer_send_connect_ack failed |
| 17 | |
| 18 | |
| 19 | == Answer by Fabienne Maignan |
| 20 | |
| 21 | In obelix, it is not recommended to work on the tmp disk when asking for several proc. Thus, in the Job, you have to modify the RUN_DIR_PATH from |
| 22 | |
| 23 | #RUN_DIR_PATH=/workdir/or/scratchdir/of/this/machine |
| 24 | |
| 25 | to |
| 26 | |
| 27 | RUN_DIR_PATH=/home/diskname/mylogin/RUNDIR |
| 28 | |
| 29 | where diskname must be changed by the disk where you want to run (e.g. orchidee01, surface7, etc.) and mylogin is your login. You also have to create the RUNDIR directory before running. |
| 30 | |
| 31 | ''' Once the simulation is finished it is very important to clean the RUNDIR to avoid unnecessary storage of forcing files! |
| 32 | |
| 33 | |
| 34 | |