Context Navigation

← Previous Ticket
Next Ticket →

#354 closed Defect (fixed)

ORCA-LIM3 MPI problem?

Reported by:	MAMM	Owned by:	vancop
Priority:	normal	Milestone:
Component:	LIM3	Version:	v3.0
Severity:		Keywords:	LIM* MPI v3.0
Cc:

Description

Hi,

I am running both ORCA-LIM2 and ORCA-LIM3 on a Beowulf-style cluster. ORCA-LIM2 runs fine. ORCA-LIM3 runs gingerly for 8.6 years of integration and then abruptly stops. The model results are just fine until this moment. I get the following message:

<NO ERROR MESSAGE> : Pointer conversions exhausted
Too many MPI objects may have been passed to/from Fortran
without being freed

This means nothing to me, and I do not know where does the problem first occur, but I expect it must be a LIM3 issue, as I have no trouble with LIM2. I have, nevertheless, checked that the model does not run in to out-of-bounds problems. Before I start searching for a solution, I wanted to check whether other NEMO users/developers have come across this problem.

Thanks,

Miguel Angel

Commit History (0)

(No commits)

Change History (9)

comment:1 in reply to: ↑ description Changed 15 years ago by rblod

Replying to MAMM:

Hi,

I am running both ORCA-LIM2 and ORCA-LIM3 on a Beowulf-style cluster. ORCA-LIM2 runs fine. ORCA-LIM3 runs gingerly for 8.6 years of integration and then abruptly stops. The model results are just fine until this moment. I get the following message:

<NO ERROR MESSAGE> : Pointer conversions exhausted
Too many MPI objects may have been passed to/from Fortran
without being freed

This means nothing to me, and I do not know where does the problem first occur, but I expect it must be a LIM3 issue, as I have no trouble with LIM2. I have, nevertheless, checked that the model does not run in to out-of-bounds problems. Before I start searching for a solution, I wanted to check whether other NEMO users/developers have come across this problem.

Thanks,

Miguel Angel

Hi Miguel

I would suspect the routine mpp_ini_ice (in module lib_mpp.F90) in which I created a special communicator for ice processors only, called ncomm_ice. This routine is called at each ice time step and for each category before thermodynamics call. The communicator is always overwritten but never really freed. A cleaner way to do this could may be a call to MPI_COMM_FREE(ncomm_ice) at the end at the end of the ice time_step or at the beginning of the routine.

I hope it helps

Rachid

comment:2 follow-up: ↓ 3 Changed 15 years ago by MAMM

Rachid,

I am not sure I understand. There are already calls to mpp_comm_free(ncomm_ice) within the loop over categories in limthd.F90. I do not see how adding another call at the end can help. In fact, when I try, I get the following message: MPI_COMM_FREE : Null communicator

Sorry for the trouble. I realise it is probably a silly problem, but I just cannot see how to go about solving it.

Miguel Angel

comment:3 in reply to: ↑ 2 Changed 15 years ago by rblod

All apologise for the trouble, I forgot I freed the communicator in limthd when I implemented this....
In the same routine mpp_ini_ice, It may deal with the ngrp_ice which is not destructed, and could be just after the creation of the ice communicator (something like call mpi_group_free(ngrp_ice), but here my knowledge of mpi reaches its limits.
In addition, for what I know, LIM3 has been run successfully on parallel at NOCS for a long time period

Replying to MAMM:

Rachid,

I am not sure I understand. There are already calls to mpp_comm_free(ncomm_ice) within the loop over categories in limthd.F90. I do not see how adding another call at the end can help. In fact, when I try, I get the following message: MPI_COMM_FREE : Null communicator

Sorry for the trouble. I realise it is probably a silly problem, but I just cannot see how to go about solving it.

Miguel Angel