New URL for NEMO forge!   http://forge.nemo-ocean.eu

Since March 2022 along with NEMO 4.2 release, the code development moved to a self-hosted GitLab.
This present forge is now archived and remained online for history.
#1351 (Problems with AMM12 SETTE tests with -O3 optimisation (ifort compiler)) – NEMO

Opened 10 years ago

Closed 10 years ago

Last modified 10 years ago

#1351 closed Bug (fixed)

Problems with AMM12 SETTE tests with -O3 optimisation (ifort compiler)

Reported by: acc Owned by: nemo
Priority: low Milestone:
Component: OCE Version: v3.6
Severity: Keywords:
Cc:

Description

My initial attempts to run the AMM12 SETTE tests with a v3.6 trunk (rev 4673) failed when using the ifort compiler (v14) and a -O3 optimisation level. The failure was CFL breaches after 12 time steps. The tests ran successfully at -O2 and below for AMM12 and at -O3 for all other tests (except AGRIF).

Eventually tracked the cause to this block at line 291 in dynspg_ts.F90 ( key_vectopt_loop is not defined ):

      DO jk = 1, jpkm1
#if defined key_vectopt_loop
         DO jj = 1, 1         !Vector opt. => forced unrolling
            DO ji = 1, jpij
#else 
         DO jj = 1, jpj
            DO ji = 1, jpi
#endif                                                                   
               zu_frc(ji,jj) = zu_frc(ji,jj) + fse3u_n(ji,jj,jk) * ua(ji,jj,jk) * umask(ji,jj,jk)
               zv_frc(ji,jj) = zv_frc(ji,jj) + fse3v_n(ji,jj,jk) * va(ji,jj,jk) * vmask(ji,jj,jk)
            END DO
         END DO
      END DO

which looks harmless but adding a compiler directive to suppress loop fusion enables a successful SETTE test at -O3 optimisation. I.e.:

      DO jk = 1, jpkm1
!DIR$ NOFUSION
#if defined key_vectopt_loop
         DO jj = 1, 1         !Vector opt. => forced unrolling
            DO ji = 1, jpij
#else 
         DO jj = 1, jpj
            DO ji = 1, jpi
#endif                                                                   
               zu_frc(ji,jj) = zu_frc(ji,jj) + fse3u_n(ji,jj,jk) * ua(ji,jj,jk) * umask(ji,jj,jk)
               zv_frc(ji,jj) = zv_frc(ji,jj) + fse3v_n(ji,jj,jk) * va(ji,jj,jk) * vmask(ji,jj,jk)
            END DO
         END DO
      END DO

Does anyone have a clue what might be happening here?

Commit History (1)

ChangesetAuthorTimeChangeLog
4687acc2014-06-24T17:22:03+02:00

#1351 alternative loop structure to fix errors in dynspg_ts.F90 when compiling with -O3 and the ifort compiler. Without this change the AMM12 SETTE tests fail after 12 timesteps. Also included a single line efficiency change in domzgr.F90 and improvements to sette scripts and local NOCS files.

Change History (5)

comment:1 Changed 10 years ago by acc

Alternatively, if I replace the inner loops and reduce the block to:

      DO jk = 1, jpkm1
         zu_frc(:,:) = zu_frc(:,:) + fse3u_n(:,:,jk) * ua(:,:,jk) * umask(:,:,jk)
         zv_frc(:,:) = zv_frc(:,:) + fse3v_n(:,:,jk) * va(:,:,jk) * vmask(:,:,jk)
      END DO

Then the tests are successful at -O3 without any compiler directives. Can anyone see any reason not to make this change permanent?

comment:2 Changed 10 years ago by jchanut

We found exactly the same solution: That's however weird.
It may depend on which ifort version you use.
No reason not to do this change.

comment:3 Changed 10 years ago by acc

  • Resolution set to fixed
  • Status changed from new to closed

Change submitted at revision #4687

comment:4 Changed 10 years ago by smasson

do you use the compilation option:

-fp-model precise

comment:5 Changed 10 years ago by acc

Yes. Although on this machine '-fp-model source' makes more sense otherwise you get a lot of warnings:

ifort: command line warning #10212: -fp-model precise evaluates in source precision with Fortran.

Both give the same error after 12 time steps with the original code.

Note: See TracTickets for help on using tickets.