intel mkl crashes after reading wfcs (yambo 4.4.1 / qe 6.4.1)

You can post here problems arising when using the last release of Yambo. Issues as parallelization strategy, performance issues and other technical aspects realted to the new release.

Moderators: Davide Sangalli, andrea marini, Daniele Varsano, andrea.ferretti, Conor Hogan, myrta gruning

Post Reply
User avatar
wachr
Posts: 30
Joined: Wed Sep 24, 2014 4:43 pm

intel mkl crashes after reading wfcs (yambo 4.4.1 / qe 6.4.1)

Post by wachr » Thu Jan 09, 2020 10:43 am

Dear all,

when calculating an 2D heterostack (for BSE, converging k-points), I came across instabilities of the intel-mkl (2019) linked with qe and yambo:
In DFT (QE), when increasing the k-point density from 12x12x1 to 18x18x1 (hexagonal unit cell), I had to change the parallelization strategy in QE in order to avoid segfaults (running qe with mpirun pw.x -nk 2, which means two pools of processors for k-point parallelization).

Independent on the k-point density - also for the 12x12x1 - I found a segfault when yambo was reading the last part of the wavefunction on the first processor:

Code: Select all

*** Process received signal ***
Signal: Segmentation fault (11)
Signal code: Address not mapped (1)
Failing at address: 0x13787
[ 0] /usr/lib64/libpthread.so.0(+0xf5e0)[0x7f1d77d065e0]
[ 1] /beegfs-home/modules/intelmkl/compilers_and_libraries_2019/linux/mkl/lib/intel64_lin/libmkl_intel_lp64.so(cdotc+0xba)[0x7f1d7f1eaeda]
Thus, I thought to use an older version of yambo without mkl (4.2.1) to process the database: ./SAVE//ndb.gops; Variable GROT; NetCDF: Start+count exceeds dimension bound suddenly appeared. In order to resolve this, I removed the ndb.gops and ran the initialization step of yambo, once again with the older version. And it runs. But it's a bit unsatisfactory as the numerical routines from the system are loaded that decrease the performance strongly.

So my question is what to do in order to get yambo running performantly with the mkl. From the QE experience, I also modified the parallelization strategy in yambo - without success. Is there any idea on this? May this be a compilation issue?

Thank you very much!
Christian

P.S. Some i/o on the 6.4.1. run in the appendix
edit: typos.
You do not have the required permissions to view the files attached to this post.
Christian Wagner
Institute of Physics
Chemnitz University of Technology, Germany

User avatar
Daniele Varsano
Posts: 2149
Joined: Tue Mar 17, 2009 2:23 pm
Contact:

Re: intel mkl crashes after reading wfcs (yambo 4.4.1 / qe 6.4.1)

Post by Daniele Varsano » Thu Jan 09, 2020 10:59 am

Dear Christian,
we will have a look at that, can you post also your config.log files?
What seems strange to me is that the crash happens when reading the wfs, so not seems related to the linear algebra operations.
In the meanwhile, you can try to compile yambo-4.4 using internal linear algebra:
-/configure --enable-int-linalg
and see if it runs and the performances are not severely compromised.

Best,
Daniele
Dr. Daniele Varsano
S3-CNR Institute of Nanoscience and MaX Center, Italy
MaX - Materials design at the Exascale
http://www.nano.cnr.it
http://www.max-centre.eu/

User avatar
wachr
Posts: 30
Joined: Wed Sep 24, 2014 4:43 pm

Re: intel mkl crashes after reading wfcs (yambo 4.4.1 / qe 6.4.1)

Post by wachr » Thu Jan 09, 2020 11:12 am

Dear Daniele,

thank you very much for the extremely quick answer! I will try your hint in order to resolve the issue.

I used the following call of configure to produce the config.log file attached

Code: Select all

./configure --prefix=$(pwd) --enable-iotk --with-iotk-path=$(pwd)/../iotk  --enable-mpi  --enable-uspp  --with-fft-path=/beegfs-home/modules/fftw3/3.3.8/ --with-lapack-libs="-lmkl_intel_lp64  -lmkl_sequential -lmkl_core" --with-blas-libs="-lmkl_intel_lp64 -lmkl_sequential -lmkl_core" --with-scalapack-libs=-lscalapack --with-blacs-libs=-lblacs 
Best regards!
Christian
You do not have the required permissions to view the files attached to this post.
Christian Wagner
Institute of Physics
Chemnitz University of Technology, Germany

User avatar
wachr
Posts: 30
Joined: Wed Sep 24, 2014 4:43 pm

Re: intel mkl crashes after reading wfcs (yambo 4.4.1 / qe 6.4.1)

Post by wachr » Thu Jan 09, 2020 12:21 pm

I compiled the internal, linear algebra into yambo and tried to run the job again: it works. So this appears to be a working state, so far.

In case the performance of the internal blas / lapack is similar to the mkl, I will not try to put more energy into the mkl-version. However, this error may hit somebody else. So in case that it is reproducible by you, I would be happy to receive feedback :).

All the best and thank you for your effort!
Christian
Christian Wagner
Institute of Physics
Chemnitz University of Technology, Germany

andrea.ferretti
Posts: 115
Joined: Fri Jan 31, 2014 11:13 am

Re: intel mkl crashes after reading wfcs (yambo 4.4.1 / qe 6.4.1)

Post by andrea.ferretti » Fri Jan 10, 2020 10:17 am

Dear Christian,

pls note that the USPP implementation of yambo is meant to be a beta-version. If you are not using USPP pseudopot I would remove
--enable-uspp from the configure line (this may trigger un-wanted behaviours).

Moreover: the Intel19 compiler has been found to miscompile (or mis-optimize) a parallelism related library of yambo leading to random crashes.
This problem is compiler-related (eg does not show up when using the gnu compiler), and was worked around in yambo-4.5 (just released).
If you have to recompile the code, I would checkout this version, in order to get rid of this possible compiler issue.

take care
Andrea
Andrea Ferretti, PhD
CNR-NANO-S3 and MaX Centre
via Campi 213/A, 41125, Modena, Italy
Tel: +39 059 2055322; Skype: andrea_ferretti
URL: http://www.nano.cnr.it

User avatar
wachr
Posts: 30
Joined: Wed Sep 24, 2014 4:43 pm

Re: intel mkl crashes after reading wfcs (yambo 4.4.1 / qe 6.4.1)

Post by wachr » Tue Jan 14, 2020 11:49 am

Dear Andrea,

thank you very much! Basically, I used the gnu7 compiler (an information that I forgot to write - which is, however, visible from the configure output).

Then, I will remove the --enable-uspp flag for the compilation with intelmkl and report whether this was the reason. And maybe, I will try yambo 4.5 (is it fully compatible with the wavefunctions and databases for yambo 4.4?).

Best regards,
Christian
Last edited by wachr on Thu Jan 16, 2020 8:33 am, edited 1 time in total.
Christian Wagner
Institute of Physics
Chemnitz University of Technology, Germany

andrea.ferretti
Posts: 115
Joined: Fri Jan 31, 2014 11:13 am

Re: intel mkl crashes after reading wfcs (yambo 4.4.1 / qe 6.4.1)

Post by andrea.ferretti » Tue Jan 14, 2020 11:56 am

Dear Christian,

thanks for reporting.
Yes, wave functions converted from 4.4 should be compatible with 4.5
(worst case scenario, delete the SAVE/ndb* files and re-do the initialisation).

Andrea
Andrea Ferretti, PhD
CNR-NANO-S3 and MaX Centre
via Campi 213/A, 41125, Modena, Italy
Tel: +39 059 2055322; Skype: andrea_ferretti
URL: http://www.nano.cnr.it

Post Reply