#25 Crash on Lemaitre3

Mở
%! (template.HTML=4 năm trước cách đây)đang mở bởi pbarriat · 1 ý kiến

Dear all,

On Lemaitre3, with the standard configuration (see README), if sometimes you get this crash:

Sep 16 15:27:19 lm3-w045 ifsmaster-ecconf: (hfi/PSM)[94909]: PSM2 can't open hfi unit: -1 (err=23) 

or this one:

forrtl: Remote I/O error

you should change a little bit the code of the OASIS coupler.

From your ec-earth repository, open sources/oasis3-mct/lib/psmile/src/mod_oasis_method.F90 and replace:

 423                WRITE(filename,'(a,i2.2)') 'debug.root.',compid
 429                WRITE(filename2,'(a,i2.2)') 'debug.notroot.',compid
 436            WRITE(filename,'(a,i2.2,a,i6.6)') 'debug.',compid,'.',mpi_rank_local

with:

 423                WRITE(filename,'(a,i2.2)') '/dev/shm/debug.root.',compid
 429                WRITE(filename2,'(a,i2.2)') '/dev/shm/debug.notroot.',compid
 436            WRITE(filename,'(a,i2.2,a,i6.6)') '/dev/shm/debug.',compid,'.',mpi_rank_local

Once done, re-compile oasis, ifs and nemo...

Reason:

the scratch on Lemaitre3 is a BeeGFS file system which "doesn't like" small files. At the beginning of a run, OASIS creates many small files (in a very short period) and sometimes BeeGFS can't handle them.

So it's better to write these files on the RAM (= /dev/shm/) instead of your running directory (scratch)

Dear all, On Lemaitre3, with the standard configuration (see [README](https://gogs.elic.ucl.ac.be/pbarriat/ecearth_patch/src/master/README.md)), if sometimes you get this crash: ``` Sep 16 15:27:19 lm3-w045 ifsmaster-ecconf: (hfi/PSM)[94909]: PSM2 can't open hfi unit: -1 (err=23) ``` or this one: ``` forrtl: Remote I/O error ``` you should change a little bit the code of the OASIS coupler. From your ec-earth repository, open `sources/oasis3-mct/lib/psmile/src/mod_oasis_method.F90` and replace: ``` 423 WRITE(filename,'(a,i2.2)') 'debug.root.',compid 429 WRITE(filename2,'(a,i2.2)') 'debug.notroot.',compid 436 WRITE(filename,'(a,i2.2,a,i6.6)') 'debug.',compid,'.',mpi_rank_local ``` with: ``` 423 WRITE(filename,'(a,i2.2)') '/dev/shm/debug.root.',compid 429 WRITE(filename2,'(a,i2.2)') '/dev/shm/debug.notroot.',compid 436 WRITE(filename,'(a,i2.2,a,i6.6)') '/dev/shm/debug.',compid,'.',mpi_rank_local ``` Once done, re-compile oasis, ifs and nemo... Reason: the scratch on Lemaitre3 is a BeeGFS file system which "doesn't like" small files. At the beginning of a run, OASIS creates many small files (in a very short period) and sometimes BeeGFS can't handle them. So it's better to write these files on the RAM (= /dev/shm/) instead of your running directory (scratch)
Charles Pelletier đã nhận xét 4 năm trước cách đây
Người hợp tác

The same bug also affected the PARAMOUR NEMO-CCLM coupled setup. The fix described above solved the issue. Thanks PY.

@klein: Is it possible to include a CPP "BeeFGS" key in OASIS, and adapt the code to use the fix described above when that key is triggered?

The same bug also affected the PARAMOUR NEMO-CCLM coupled setup. The fix described above solved the issue. Thanks PY. @klein: Is it possible to include a CPP "BeeFGS" key in OASIS, and adapt the code to use the fix described above when that key is triggered?
Đăng nhập để tham gia bình luận.
Không có Milestone
Không có người được phân công
2 tham gia
Đang tải...
Hủy bỏ
Lưu
Ở đây vẫn chưa có nội dung nào.