#2492 closed Bug (fixed)
Out-of-bounds error in ORCA2-based mono-processor configuration
Reported by: | smueller | Owned by: | smueller |
---|---|---|---|
Priority: | low | Milestone: | |
Component: | ICB | Version: | v4.0 |
Severity: | minor | Keywords: | ICB LBC non-MPP v4.0 |
Cc: | smasson, pierre.mathiot@… |
Description
Context
A user of an ORCA2-based mono-processor configuration (key_mpp_mpi undefined) has reported an out-of-bounds error which occurs during the north-fold boundary exchange in subroutine lbc_nfd_2d_ext.
Analysis
This out-of-bounds error can readily be reproduced in reference configuration ORCA2_ICE_PISCES by removing CPP-keys key_mpp_mpi and key_iomput.
The error is caused by the initialisation of array tmask_e(0:jpi+1,0:jpj+1) in subroutine icb_init; it is absent when iceberg handling is disabled (ln_icebergs = .FALSE.). The initialisation of tmask_e is finalised by calling subroutine mpp_lnk_2d_icb (interface lbc_lnk_icb), which differs in the north-fold treatment depending on whether jpni is 1 (incl. mono-processor case) or greater: if jpni = 1, subroutine lbc_nfd_2d_ext (implemented in source:/NEMO/releases/r4.0/r4.0-HEAD/src/OCE/LBC/lbc_nfd_ext_generic.h90) is called. Of array tmask_e(0:jpi+1,0:jpj+1), the subset tmask_e(1:jpi,1:jpj+1) is passed to subroutine lbc_nfd_2d_ext by subroutine mpp_lnk_2d_icb. While subroutine lbc_nfd_2d_ext refers to this subset as ptab(1:jpi,0:jpj), it accesses array elements with a dimension-2 subscript of nlcj+1. Since jpj == nlcj, this results in out-of-bounds array access and potentially incorrect content of the array.
The same error also affects arrays {u,v}mask_e initialised in subroutine icb_init (source:/NEMO/releases/r4.0/r4.0-HEAD/src/OCE/ICB/icbini.F90) and {uo,vo,ff,tt,fr,ua,va,hi,vi}_e initialised in subroutine icb_utl_copy (source:/NEMO/releases/r4.0/r4.0-HEAD/src/OCE/ICB/icbutl.F90). Unrelated to out-of-bounds array access, it appears that the lbc_lnk_icb calls used to initialise arrays {u,v}mask_e specify incorrect grid types ('T'), which could result in further incorrect boundary exchanges that may negatively affect these two arrays.
Fix
Subroutine mpp_lnk_2d_icb could be adjusted to retain the bounds for dimension 2 of the array passed to subroutine lbc_nfd_2d_ext, i.e.,
-
src/OCE/LBC/lbclnk.F90
381 381 IF( npolj /= 0 ) THEN 382 382 ! 383 383 SELECT CASE ( jpni ) 384 CASE ( 1 ) ; CALL lbc_nfd ( pt2d(1:jpi,1 :jpj+kextj), cd_type, psgn, kextj )384 CASE ( 1 ) ; CALL lbc_nfd ( pt2d(1:jpi,1-kextj:jpj+kextj), cd_type, psgn, kextj ) 385 385 CASE DEFAULT ; CALL mpp_lbc_north_icb( pt2d(1:jpi,1:jpj+kextj), cd_type, psgn, kextj ) 386 386 END SELECT 387 387 !
Further, the grid types specified in the lbc_lnk_icb calls for arrays {u,v}mask_e could be adjusted according to
-
src/OCE/ICB/icbini.F90
239 239 umask_e(:,:) = 0._wp ; umask_e(1:jpi,1:jpj) = umask(:,:,1) 240 240 vmask_e(:,:) = 0._wp ; vmask_e(1:jpi,1:jpj) = vmask(:,:,1) 241 241 CALL lbc_lnk_icb( 'icbini', tmask_e, 'T', +1._wp, 1, 1 ) 242 CALL lbc_lnk_icb( 'icbini', umask_e, ' T', +1._wp, 1, 1 )243 CALL lbc_lnk_icb( 'icbini', vmask_e, ' T', +1._wp, 1, 1 )242 CALL lbc_lnk_icb( 'icbini', umask_e, 'U', +1._wp, 1, 1 ) 243 CALL lbc_lnk_icb( 'icbini', vmask_e, 'V', +1._wp, 1, 1 ) 244 244 ! 245 245 ! assign each new iceberg with a unique number constructed from the processor number 246 246 ! and incremented by the total number of processors
Commit History (2)
Changeset | Author | Time | ChangeLog |
---|---|---|---|
13350 | smueller | 2020-07-28T14:28:29+02:00 | Remedy for the bugs reported in ticket #2492 |
13276 | mathiot | 2020-07-09T09:47:18+02:00 | ticket #2494 and #2375: wrong point type inn lbc_lnk_icb for umask_e and vmask_e (see ticket #2492) |
Change History (12)
comment:1 Changed 4 years ago by smasson
- Cc smasson added
comment:2 Changed 4 years ago by mathiot
- Cc pierre.mathiot@… added
comment:3 Changed 4 years ago by mathiot
comment:4 Changed 4 years ago by smueller
- Version v4.0.* deleted
Subroutine mpp_lnk_2d_icb only appears to be called from within the ICB source code (subroutines icb_ini and icb_utl_copy), so its modification should only affect model runs with ln_icebergs=.true.. Further, the proposed modification of mpp_lnk_2d_icb only affects a subroutine call when jpni=1 and, since runs using the model compiled with key_mpp_mpi, ln_icebergs=.true., and jpni=1 are explicitely prevented (source:/NEMO/releases/r4.0/r4.0-HEAD/src/OCE/ICB/icbini.F90:#L112), it should only affect mono-processor runs without key_mpp_mpi.
comment:5 Changed 4 years ago by smueller
- Version set to v4.0
comment:6 Changed 4 years ago by smueller
- Owner changed from systeam to smueller
- Status changed from new to assigned
In an email discussion it was proposed to test source:/NEMO/releases/r4.0/r4.0-HEAD with the above fix for module lbclnk by comparing the run.stat output files produced by LONG runs with the ORCA2_ICE_PISCES reference configuration i) as used by SETTE (with key_mpp_mpi, jpni=4, and jpnj=8), ii) with jpni=1 after disabling line source:/NEMO/releases/r4.0/r4.0-HEAD/src/OCE/ICB/icbini.F90@13346:#L112, and iii) without key_mpp_mpi; it was also suggested that the second of the bugs reported above, the specification of incorrect grid types in the initialisation of arrays {u,v}mask_e, should be fixed as proposed (see also [13276]).
comment:7 Changed 4 years ago by smueller
After disabling line source:/NEMO/releases/r4.0/r4.0-HEAD/src/OCE/ICB/icbini.F90@13346:#L112, the model in ORCA2_ICE_PISCES reference configuration with jpni=1 and jpnj=32 crashes at time step 693; after including the proposed fixes for modules lbclnk and icbini, this model crash no longer occurs.
comment:8 Changed 4 years ago by smueller
The proposed test (see comment:6) has been successful: run.stat files produced using source:/NEMO/releases/r4.0/r4.0-HEAD/@13346 with the proposed fixes of modules lbclnk and icbini are identical across all three cases; further, in cases i and iii, the run.stat output files are also identical to the corresponding run.stat files produced using source:/NEMO/releases/r4.0/r4.0-HEAD@13346 without the proposed fixes (in case ii, one of the runs did not complete, see comment:7).
Further, it has also been found that output files tracer.stat differ between the MPP cases (i, ii) and the mono-processor case without key_mpp_mpi (iii); the tracer.stat output, however, has remained unchanged after the proposed fixes have been applied both in case i and iii. This difference in tracer.stat output appears to be unrelated to the bugs detailed above and should be reported in a different ticket.
comment:9 Changed 4 years ago by smueller
In 13350:
comment:10 Changed 4 years ago by smueller
- Resolution set to fixed
- Status changed from assigned to closed
source:/NEMO/releases/r4.0/r4.0-HEAD@13350 has passed the standard SETTE tests. Further, source:/NEMO/releases/r4.0/r4.0-HEAD@13350 compiled with debug options (incl. bounds checking) and without key_mpp_mpi and key_iomput runs successfully.
comment:11 Changed 4 years ago by mathiot
Should this fix also be added to the trunk ? Is the plan to have a big push of bug fixes into the trunk based on the NEMO 4.0.3 released ?
comment:12 Changed 2 years ago by nemo
- Keywords v4.0 added
In 13276: