#25 Crash on Lemaitre3

Otwarty
otworzone 4 lat temu przez pbarriat · 1 komentarzy

Dear all,

On Lemaitre3, with the standard configuration (see README), if sometimes you get this crash:

Sep 16 15:27:19 lm3-w045 ifsmaster-ecconf: (hfi/PSM)[94909]: PSM2 can't open hfi unit: -1 (err=23) 

or this one:

forrtl: Remote I/O error

you should change a little bit the code of the OASIS coupler.

From your ec-earth repository, open sources/oasis3-mct/lib/psmile/src/mod_oasis_method.F90 and replace:

 423                WRITE(filename,'(a,i2.2)') 'debug.root.',compid
 429                WRITE(filename2,'(a,i2.2)') 'debug.notroot.',compid
 436            WRITE(filename,'(a,i2.2,a,i6.6)') 'debug.',compid,'.',mpi_rank_local

with:

 423                WRITE(filename,'(a,i2.2)') '/dev/shm/debug.root.',compid
 429                WRITE(filename2,'(a,i2.2)') '/dev/shm/debug.notroot.',compid
 436            WRITE(filename,'(a,i2.2,a,i6.6)') '/dev/shm/debug.',compid,'.',mpi_rank_local

Once done, re-compile oasis, ifs and nemo...

Reason:

the scratch on Lemaitre3 is a BeeGFS file system which "doesn't like" small files. At the beginning of a run, OASIS creates many small files (in a very short period) and sometimes BeeGFS can't handle them.

So it's better to write these files on the RAM (= /dev/shm/) instead of your running directory (scratch)

Dear all, On Lemaitre3, with the standard configuration (see [README](https://gogs.elic.ucl.ac.be/pbarriat/ecearth_patch/src/master/README.md)), if sometimes you get this crash: ``` Sep 16 15:27:19 lm3-w045 ifsmaster-ecconf: (hfi/PSM)[94909]: PSM2 can't open hfi unit: -1 (err=23) ``` or this one: ``` forrtl: Remote I/O error ``` you should change a little bit the code of the OASIS coupler. From your ec-earth repository, open `sources/oasis3-mct/lib/psmile/src/mod_oasis_method.F90` and replace: ``` 423 WRITE(filename,'(a,i2.2)') 'debug.root.',compid 429 WRITE(filename2,'(a,i2.2)') 'debug.notroot.',compid 436 WRITE(filename,'(a,i2.2,a,i6.6)') 'debug.',compid,'.',mpi_rank_local ``` with: ``` 423 WRITE(filename,'(a,i2.2)') '/dev/shm/debug.root.',compid 429 WRITE(filename2,'(a,i2.2)') '/dev/shm/debug.notroot.',compid 436 WRITE(filename,'(a,i2.2,a,i6.6)') '/dev/shm/debug.',compid,'.',mpi_rank_local ``` Once done, re-compile oasis, ifs and nemo... Reason: the scratch on Lemaitre3 is a BeeGFS file system which "doesn't like" small files. At the beginning of a run, OASIS creates many small files (in a very short period) and sometimes BeeGFS can't handle them. So it's better to write these files on the RAM (= /dev/shm/) instead of your running directory (scratch)
Charles Pelletier skomentował 4 lat temu
Współpracownik

The same bug also affected the PARAMOUR NEMO-CCLM coupled setup. The fix described above solved the issue. Thanks PY.

@klein: Is it possible to include a CPP "BeeFGS" key in OASIS, and adapt the code to use the fix described above when that key is triggered?

The same bug also affected the PARAMOUR NEMO-CCLM coupled setup. The fix described above solved the issue. Thanks PY. @klein: Is it possible to include a CPP "BeeFGS" key in OASIS, and adapt the code to use the fix described above when that key is triggered?
Zaloguj się, aby dołączyć do tej rozmowy.
Brak kamienia milowego
Brak przypisania
2 uczestników
Ładowanie...
Anuluj
Zapisz
Nie ma jeszcze treści.