Opened 8 years ago

Closed 7 years ago

Last modified 6 years ago

#302 closed enhancement (fixed)

optimization in pack_restart

Reported by: mafoipsl Owned by: sdipsl
Priority: major Milestone: libIGCM_v2.8.3
Component: PostProcessing Version:
Keywords: pack storage files Cc:

Description

After experimentation, it will be optimized to keep only the last period in pack_output instead of all periods. If one needs to restart after a packed period, one will restart the full packperiod. For packperiod of 10Y we will save 90% of storage.

Who disagree?

Change History (9)

comment:1 follow-up: Changed 8 years ago by sdipsl

you mean pack_restart. right? if it is so I agree.

comment:2 Changed 8 years ago by mafoipsl

  • Summary changed from optimization in pack_output to optimization in pack_restart

comment:3 in reply to: ↑ 1 Changed 8 years ago by mafoipsl

Replying to sdipsl:

you mean pack_restart. right? if it is so I agree.

yes!!!!!

comment:4 Changed 7 years ago by sdipsl

very easy to do but will prevent to restart from piControl using consecutive years for example.

comment:5 Changed 7 years ago by mafoipsl

Default :
for the last period only to reduce significantly the size of the file produced in RESTART.

comment:6 Changed 7 years ago by nillod

In decenal simulations we need to access to all of the restart years and not only last ones (we define restart years considering for example ENSO or NAO phase, eruption...).
It could be an optional flag to activate if we need storage optimisation ?

comment:7 Changed 7 years ago by sdipsl

  • Milestone changed from libIGCM_v3 release candidate to libIGCM_v2.8.3
  • Owner changed from somebody to sdipsl
  • Status changed from new to assigned

I will introduce the flag lightRestartPack that will be false by default for ascendant compatibility

comment:8 Changed 7 years ago by sdipsl

  • Resolution set to fixed
  • Status changed from assigned to closed

Done see r1395

comment:9 Changed 6 years ago by mafoipsl

This is not working for a simulation with PeriodLength=1M starting on 1/2 1/3 ... 1/12 like the mini-ensemble of 4xCO2.

Exemple here :
...home/gencmip6/??????/CMIP6/DECK/IPSLCM6.1.5-LR/modipsl/config/IPSLCM6/CM61-LR-4xCO2-02/POST_REDO/PACKRESTART.out_539642

The variable ${PeriodDateBegin} is not defined in pack_restart.job line :

   220       if [ ${date_file} -le ${date_end_pack} ] && [ ${date_file} -ge ${PeriodDateBegin} ] ; then 

Sorry, I didn't take the time to understand the algorithm.

The consequence is a directory with more than 50 000 files (60 months x 3 components x 360 files) and a mail from TGCC pointing a bad situation :

Bonjour, 

Un usage inapproprié du système de fichiers WORK a été détecté pour votre compte utilisateur p86maf. Ceci est la 1ère alerte concernant ce problème. 

Afin de ne pas dégrader les performances en méta-données d'un système de fichiers parallèle, un répertoire dans ce système de fichiers ne doit pas contenir plus de 50000 entrées. Pour plus d'informations, vous pouvez vous référer au chapitre "Recommended data usage on parallel file system" de la documentation utilisateur. 

En cas d'usage inapproprié répété, la possibilité de soumettre des jobs de calcul sera suspendue à la 3ème alerte pour votre compte utilisateur p86maf. À la 5ème alerte, votre compte utilisateur p86maf sera verrouillé et vos données pourront être automatiquement compactées voire supprimées. 

Nous vous rappelons le processus d'escalade en cas d'alertes répétées pour ce problème : 

* Alerte 1 : information (semaine 1) 
* Alerte 2 : information (semaine 2) 
* Alerte 3 : soumission de job bloquée (semaine 3) 
* Alerte 4 : information (semaine 4) 
* Alerte 5 : compte verrouillé et données pouvant être compactées/supprimées (semaine 5) 

Vous trouvez-ci dessous, le(s) 11 répertoire(s) excédant le maximum d'entrée (pour un cumul de 718839 entrées) : 

+------------------------------------------------------------------+-----------+----------+
|                            Repertoires                           | Nb inodes |  Alerte  |
+------------------------------------------------------------------+-----------+----------+
| /ccc/work/.scratch/cont003/gencmip6/??????/RUN_DIR/450648_80870  | 65349     | 1        |
...
+------------------------------------------------------------------+-----------+----------+ 

Merci de nous aider à maintenir une bonne qualité de service sur le centre de calcul. 

...
Note: See TracTickets for help on using tickets.