BSE stop with "Allocation of K_slk%blc failed"

Deals with issues related to computation of optical spectra, in RPA (-o c) or by solving the Bethe-Salpeter equation (-o b). Includes local field effects, excitons, etc.

Moderators: Davide Sangalli, andrea marini, Conor Hogan, myrta gruning

BSE stop with "Allocation of K_slk%blc failed"

Postby sdwang » Tue Aug 13, 2019 9:03 am

Dear developers,
I met a new problem when running BSE as:
...
<04d-15h-25m-39s> P0003: Kernel |################# | [085%] 03d-17h-09m-34s(E) 04d-08h-53m-09s(X)
<04d-18h-32m-47s> P0003: Kernel |################## | [090%] 03d-20h-16m-43s(E) 04d-06h-31m-32s(X)
<04d-21h-34m-55s> P0003: Kernel |################### | [095%] 03d-23h-18m-50s(E) 04d-04h-19m-21s(X)
<05d-00h-03m-35s> P0003: Kernel |####################| [100%] 04d-01h-47m-30s(E) 04d-01h-47m-30s(X)
<05d-04h-00m-33s> P0003: [07] BSE solver(s)
<05d-04h-00m-33s> P0003: [LA] SERIAL linear algebra


<05d-04h-00m-33s> P0003: [07.01] Inversion solver
P0003: [ERROR] STOP signal received while in :[07.01] Inversion solver
P0003: [ERROR]Allocation of K_slk%blc failed
....
And I generate the infut as:./yambo -b -o b -k sex -y i -V all, and the input corresponding to the paralell is:
PAR_def_mode= "balanced" # [PARALLEL] Default distribution mode ("balanced"/"memory"/"workload")
X_all_q_CPU= "1 1 1 28 1" # [PARALLEL] CPUs for each role
X_all_q_ROLEs= "q g k c v" # [PARALLEL] CPUs roles (q,g,k,c,v)
X_all_q_nCPU_LinAlg_INV= 1 # [PARALLEL] CPUs for Linear Algebra
BS_CPU= "1 14 2" # [PARALLEL] CPUs for each role
BS_ROLEs= "k eh t" # [PARALLEL] CPUs roles (k,eh,t)
BS_nCPU_LinAlg_INV= 1 # [PARALLEL] CPUs for Linear Algebra
BS_nCPU_LinAlg_DIAGO= 1 # [PARALLEL] CPUs for Linear Algebra

Thanks!

SD
S. D. Wang
IMU,HOHHOT,CHINA
E-mail: sdwang@imu.edu.cn
sdwang
 
Posts: 195
Joined: Fri Apr 09, 2010 12:30 pm

Re: BSE stop with "Allocation of K_slk%blc failed"

Postby Daniele Varsano » Thu Aug 22, 2019 10:12 am

Dear Shudong,
we will have a look to it, anyway does the calculation run smoothly when using the BSE solver as "diago" (-y d) or "haydock" (-y h) ?
Many thanks,

Daniele
Dr. Daniele Varsano
S3-CNR Institute of Nanoscience and MaX Center, Italy
MaX - Materials design at the Exascale
http://www.nano.cnr.it
http://www.max-centre.eu/
User avatar
Daniele Varsano
 
Posts: 1996
Joined: Tue Mar 17, 2009 2:23 pm

Re: BSE stop with "Allocation of K_slk%blc failed"

Postby sdwang » Thu Aug 22, 2019 10:26 am

Ciao Daniele,
Actually this is the question I wan to ask here. The calculation stops. And I found the BSE calculation with -d or -i used all of my memory, and the memory is increasing when the BSE calculation running until the memory has been used up and the calculation dies. The GW calculation is OK, but why the BSE uses much more memory even for my 3 atoms 2D MoS2? It seems the BSE can not allocate the memory to the cores. My BSE input for MoS2 is:
GPL Version 4.3.2 Revision 134. (Based on r.15658 h.afdb12
# MPI+SLK Build
# http://www.yambo-code.org
#
rim_cut # [R RIM CUT] Coulomb potential
optics # [R OPT] Optics
bss # [R BSS] Bethe Salpeter Equation solver
em1d # [R Xd] Dynamical Inverse Dielectric Matrix
bse # [R BSE] Bethe Salpeter Equation.
bsk # [R BSK] Bethe Salpeter Equation kernel
StdoHash= 20 # [IO] Live-timing Hashes
Nelectro= 26.00000 # Electrons number
ElecTemp= 0.000000 eV # Electronic Temperature
BoseTemp=-1.000000 eV # Bosonic Temperature
OccTresh=0.1000E-4 # Occupation treshold (metallic bands)
NLogCPUs=0 # [PARALLEL] Live-timing CPU`s (0 for all)
DBsIOoff= "none" # [IO] Space-separated list of DB with NO I/O. DB=(DIP,X,HF,COLLs,J,GF,CARRIERs,W,SC,BS,ALL)
DBsFRAGpm= "none" # [IO] Space-separated list of +DB to FRAG and -DB to NOT FRAG. DB=(DIP,X,W,HF,COLLS,K,BS,QINDX,RT,ELP
FFTGvecs= 45 Ry # [FFT] Plane-waves
#WFbuffIO # [IO] Wave-functions buffered I/O
PAR_def_mode= "memory" # [PARALLEL] Default distribution mode ("balanced"/"memory"/"workload")
X_all_q_CPU= "1 1 32 1" # [PARALLEL] CPUs for each role
X_all_q_ROLEs= "q k c v" # [PARALLEL] CPUs roles (q,g,k,c,v)
X_all_q_nCPU_LinAlg_INV= 1 # [PARALLEL] CPUs for Linear Algebra
BS_CPU= "1 1 32" # [PARALLEL] CPUs for each role
BS_ROLEs= "k eh t" # [PARALLEL] CPUs roles (k,eh,t)
BS_nCPU_LinAlg_INV= 1 # [PARALLEL] CPUs for Linear Algebra
BS_nCPU_LinAlg_DIAGO= 1 # [PARALLEL] CPUs for Linear Algebra
NonPDirs= "none" # [X/BSS] Non periodic chartesian directions (X,Y,Z,XY...)
RandQpts= 1000000 # [RIM] Number of random q-points in the BZ
I set the k with the cores, but it took much memory than above setting.

Thanks!

Shudong
S. D. Wang
IMU,HOHHOT,CHINA
E-mail: sdwang@imu.edu.cn
sdwang
 
Posts: 195
Joined: Fri Apr 09, 2010 12:30 pm

Re: BSE stop with "Allocation of K_slk%blc failed"

Postby Daniele Varsano » Thu Aug 22, 2019 11:13 am

Dear Shudong,
can you post the complete input/report file? What is the dimension of your matrix and the kernel parameter?
Best,
Daniele
Dr. Daniele Varsano
S3-CNR Institute of Nanoscience and MaX Center, Italy
MaX - Materials design at the Exascale
http://www.nano.cnr.it
http://www.max-centre.eu/
User avatar
Daniele Varsano
 
Posts: 1996
Joined: Tue Mar 17, 2009 2:23 pm

Re: BSE stop with "Allocation of K_slk%blc failed"

Postby sdwang » Thu Aug 22, 2019 1:39 pm

Dear Daniele,
Attached are the files of input and log. Please note that I used 1 Ry of block zise but still using 256 G memory for 2D MoS2.

Thanks!

Ciao



Shudong
You do not have the required permissions to view the files attached to this post.
S. D. Wang
IMU,HOHHOT,CHINA
E-mail: sdwang@imu.edu.cn
sdwang
 
Posts: 195
Joined: Fri Apr 09, 2010 12:30 pm

Re: BSE stop with "Allocation of K_slk%blc failed"

Postby Daniele Varsano » Thu Aug 22, 2019 4:17 pm

Dear Shudong,
Please note that I used 1 Ry of block zise

OK, but here the problem is not the building of the kernel that it is done correctly, but the solver.
I can see that despite it is a simple 2D-MoS2 you have a BSE matrix dimension of 19200. The inversion I think allocate two matrices of this size that corresponds approximately to 11Gb which is not distributed. Do you have 11Gb of memory per core?
My suggestion is:
1) Try to run the inversion runlevel in serial: in this way, you will have all the memory of the node at your disposal.
2) Reduce the BSE matrix e.g. (23-30) and see if the calculations run successfully.
3) I do not know if it can help, but you can try to update the code to a more recent version: this will help us in case some debug is needed.

Best,
Daniele
Dr. Daniele Varsano
S3-CNR Institute of Nanoscience and MaX Center, Italy
MaX - Materials design at the Exascale
http://www.nano.cnr.it
http://www.max-centre.eu/
User avatar
Daniele Varsano
 
Posts: 1996
Joined: Tue Mar 17, 2009 2:23 pm

Re: BSE stop with "Allocation of K_slk%blc failed"

Postby sdwang » Fri Aug 23, 2019 5:39 am

Dear Daniele,
I removed the parallel setting and it does not work, and I reduced the BSE matrix to 6400 (e.g. 23-30), it works. But if I need more transition states, I have to include more than 23-30 bands and the matrix dimension increases again... I used double-precision version and include SOC in MoS2, does this matter with the problem?
ps: how did you figure out that the matrix dimension of 19200 corresponds about to 11Gb?

Thanks!

Best,

Shudong
S. D. Wang
IMU,HOHHOT,CHINA
E-mail: sdwang@imu.edu.cn
sdwang
 
Posts: 195
Joined: Fri Apr 09, 2010 12:30 pm

Re: BSE stop with "Allocation of K_slk%blc failed"

Postby Daniele Varsano » Sat Aug 24, 2019 9:13 am

Dear Shudong,
I used double-precision version and include SOC in MoS2, does this matter with the problem?

Yes, it matters as both features contribute to increasing the memory needed, anyway SOC is needed for MoS2, maybe you can try to work in single precision, probably you will not lose precision.

ps: how did you figure out that the matrix dimension of 19200 corresponds about to 11Gb?

Just a rough calculation. Element in the matrix NxN, times 16 byte (complex numbers) divided by (1024)^3 to have it in Gb. in the case of inversion you need to allocate two matrices of that size.

Best,
Daniele
Dr. Daniele Varsano
S3-CNR Institute of Nanoscience and MaX Center, Italy
MaX - Materials design at the Exascale
http://www.nano.cnr.it
http://www.max-centre.eu/
User avatar
Daniele Varsano
 
Posts: 1996
Joined: Tue Mar 17, 2009 2:23 pm


Return to Linear Response

Who is online

Users browsing this forum: No registered users and 1 guest

cron