Dear all,
I am trying to do G0W0 computation on a linear chain made of 200 atoms along the z axis, performing the full frequency computation. I am running on GALILEO machines, with a mixed MPI+open MP parallelization in order to avoid OUT OF MEMORY errors.
This is the input file, together with the bash for submitting the parallel job:
#!/bin/bash
#SBATCH N 5 # number of nodes
#SBATCH mem=118000 # memory 86000MB for cache/flat nodes
#SBATCH time=24:00:00 # time limits: 24 hour
#SBATCH taskspernode=6
#SBATCH cpuspertask=6
gw0 # [R GW] GoWo Quasiparticle energy levels
rim_cut # [R RIM CUT] Coulomb potential
HF_and_locXC # [R XX] HartreeFock Selfenergy and Vxc
em1d # [R Xd] Dynamical Inverse Dielectric Matrix
X_nCPU_LinAlg_INV= $ncpu
X_Threads=0 # [OPENMP/X] Number of threads for response functions
SE_Threads=0 # [OPENMP/GW] Number of threads for selfenergy
DIP_Threads=0
RandQpts=0 # [RIM] Number of random qpoints in the BZ
RandGvec= 1 RL # [RIM] Coulomb interaction RS components
CUTGeo= "ws Z" # [CUT] Coulomb Cutoff geometry: box/cylinder/sphere/ws X/Y/Z/XY..
CUTwsGvec= 1.1000 # [CUT] WS cutoff: number of G to be modified
EXXRLvcs= 50 Ry # [XX] Exchange RL components
VXCRLvcs= 424401 RL # [XC] XCpotential RL components
Chimod= "HARTREE" # [X] IP/Hartree/ALDA/LRC/PF/BSfxc
% GbndRnge
1  500  # [GW] G[W] bands range
%
XTermKind = "BG"
GDamping= 0.10000 eV # [GW] G[W] damping
dScStep= 0.10000 eV # [GW] Energy step to evaluate Z factors
% BndsRnXd
1  500  # [Xd] Polarization function bands
%
GTermKind = "BG"
NGsBlkXd= 3 Ry # [Xd] Response block size
% DmRngeXd
0.20000  0.20000  eV # [Xd] Damping range
%
ETStpsXd= 100 # [Xd] Total Energy steps
% LongDrXd
1.000000  1.000000  1.000000  # [Xd] [cc] Electric Field
%
DysSolver= "n" # [GW] Dyson Equation solver ("n","s","g")
%QPkrange # # [GW] QP generalized Kpoint/Band indices
11399402
%
Before starting the computation I reduced the number of RL vectors from 42441312 to 40000 RL, in order to reduce the computational time.
The output is the following:
[01] CPU structure, Files & I/O Directories
===========================================
* CPUThreads :30(CPU)1(threads)1(threads@X)1(threads@DIP)1(threads@SE)1(threads@RT)1(threads@K)1(threads@NL)
* MPI CPU : 30
* THREADS (max): 1
* THREADS TOT(max): 30
* I/O NODES : 5
* NODES(computing): 5
* (I/O): 1
* Fragmented WFs : yes
CORE databases in .
Additional I/O in .
Communications in .
Input file is gw_all_BZ_ff.in
Report file is ./rall_BZ_ff_em1d_HF_and_locXC_gw0_rim_cut
Precision is SINGLE
Log files in ./LOG
Job string(s)dir(s) (main): all_BZ_ff
[RD./SAVE//ns.db1]
Bands : 500
Kpoints : 1
Gvectors [RL space]: 42441312
Components [wavefunctions]: 1513945
Symmetries [spatial+Trev]: 16
Spinor components : 1
Spin polarizations : 1
Temperature [ev]: 0.000000
Electrons : 800.0000
WF Gvectors : 1513945
Max atoms/species : 200
No. of atom species : 1
Exact exchange fraction in XC : 0.000000
Exact exchange screening in XC : 0.000000
Magnetic symmetries : no
 S/N 000347  v.04.05.01 r.00165 
[04] Coloumb potential CutOff :ws
=================================
Cut directions :Z
WS Cutoff [units to be defined]: 1.100000
Symmetry test passed :yes
Cutoff: 1.100000
n grid: 4 4 84
WS Direct Lattice(DL) unit cell [iru / cc(a.u.)]
A1 = 1.000000 0.000000 0.000000 18.89727 0.000000 0.000000
A2 = 0.000000 1.000000 0.000000 0.000000 18.89727 0.000000
A3 = 0.000000 0.000000 1.000000 0.000000 0.000000 478.8568
[WR./all_BZ_ff//ndb.cutoff]
Brillouin Zone Q/K grids (IBZ/BZ): 1 1 1 1
CutOff Geometry :ws z
Coulomb cutoff potential :ws z 1.100
Box sides length [au]: 0.00 0.00 0.00
Sphere/Cylinder radius [au]: 0.000000
Cylinder length [au]: 0.000000
RL components : 399997
RL components used in the sum : 399997
RIM corrections included :no
RIM RL components :0
RIM random points :0
 S/N 000347  v.04.05.01 r.00165 
[05] Dipoles
============
[WARNING] DIPOLES database not correct or not present
[RD./SAVE//ns.kb_pp_pwscf]
Fragmentation :yes
 S/N 000347  v.04.05.01 r.00165 
[WARNING] [x,Vnl] slows the Dipoles computation. To neglect it rename the ns.kb_pp file
[WFOscillators/G space] Performing WaveFunctions I/O from ./SAVE
[WFOscillators/G space loader] Normalization (few states) min/max :0.865E11 1.00
[WR./all_BZ_ff//ndb.dipoles]
Brillouin Zone Q/K grids (IBZ/BZ): 1 1 1 1
RL vectors (WF): 399997
Fragmentation :yes
Electronic Temperature [K]: 0.000000
Bosonic Temperature [K]: 0.000000
X band range : 1 500
X band range limits : 400 1
X e/h energy range [ev]:1.000000 1.000000
RL vectors in the sum : 399997
[r,Vnl] included :yes
Bands ordered :yes
Direct v evaluation :no
Field momentum norm :0.1000E4
Approach used :Gspace v
Dipoles computed :R V P
Wavefunctions :Perdew, Burke & Ernzerhof(X)+Perdew, Burke & Ernzerhof(C)
 S/N 000347  v.04.05.01 r.00165 
Timing [Min/Max/Average]: 02h15m40s/02h15m43s/02h15m42s
[06] Dynamical Dielectric Matrix
================================
However, the computation stops with the following error:
[ERROR] STOP signal received while in :[06] Dynamical Dielectric Matrix
[ERROR]Allocation of X_mat failed
In the LOG directory I found:
<02h16m35s> P1r039c02s08: [06] Dynamical Dielectric Matrix
<03h39m08s> P1r039c02s08: Response_G_space parallel ENVIRONMENT is incomplete. Switching to defaults
<03h39m11s> P1r039c02s08: [PARALLEL Response_G_space for K(bz) on 1 CPU] Loaded/Total (Percentual):1/1(100%)
<03h39m11s> P1r039c02s08: [PARALLEL Response_G_space for Q(ibz) on 1 CPU] Loaded/Total (Percentual):1/1(100%)
<03h39m11s> P1r039c02s08: [PARALLEL Response_G_space for Kq(ibz) on 1 CPU] Loaded/Total (Percentual):1/1(100%)
<03h39m11s> P1r039c02s08: [LA] SERIAL linear algebra
<03h39m11s> P1r039c02s08: [PARALLEL Response_G_space for K(ibz) on 1 CPU] Loaded/Total (Percentual):1/1(100%)
<03h39m11s> P1r039c02s08: [PARALLEL Response_G_space for CON bands on 5 CPU] Loaded/Total (Percentual):100/500(20%)
<03h39m11s> P1r039c02s08: [PARALLEL Response_G_space for VAL bands on 3 CPU] Loaded/Total (Percentual):134/400(34%)
P1r039c02s08: [ERROR] STOP signal received while in :[06] Dynamical Dielectric Matrix
P1r039c02s08: [ERROR]Allocation of X_mat failed
Am I doing something wrong? Do you have any suggestion to overcome such problem?
Sincerely,
Davide Romanin

PhD student in Physics XXXIII cycle
Representative of the PhD students in Physics
Applied Science and Technology department (DiSAT)
Politecnico di Torino
Corso Duca degli Abruzzi, 24
10129 Torino ITALY

Error on Allocation of X_mat for parallel computation
Moderators: Davide Sangalli, Daniele Varsano, andrea.ferretti, andrea marini, Conor Hogan, myrta gruning

 Posts: 12
 Joined: Sat Jun 06, 2020 10:43 am
 Daniele Varsano
 Posts: 2520
 Joined: Tue Mar 17, 2009 2:23 pm
 Contact:
Re: Error on Allocation of X_mat for parallel computation
Dear Davide,
in general, the full frequency calculations are very demanding and you are dealing with a large system, moreover, it is quite hard to converge with respect to the number of frequencies. Are you sure you need it? instead of the plasmon pole approximation? In general, I would discourage the use unless it is known that the plasmon pole does fail.
Anyway, what makes the calculation intense here are:
You can try to reduce one of these parameters and see if the calculation fits in your machine.
My suggestion is to compile the code using the following flag (if you have not already done):
and in the log files, you will find some info on the memory allocated so far and you can have an idea on how much memory you need.
Please note that the terminator techniques (Gterm,Xterm) do not apply in full frequency calculation.
Best,
Daniele
in general, the full frequency calculations are very demanding and you are dealing with a large system, moreover, it is quite hard to converge with respect to the number of frequencies. Are you sure you need it? instead of the plasmon pole approximation? In general, I would discourage the use unless it is known that the plasmon pole does fail.
Anyway, what makes the calculation intense here are:
Code: Select all
BndsRnXd
NGsBlkXd
ETStpsXd
My suggestion is to compile the code using the following flag (if you have not already done):
Code: Select all
enablememoryprofile
Please note that the terminator techniques (Gterm,Xterm) do not apply in full frequency calculation.
Best,
Daniele
Dr. Daniele Varsano
S3CNR Institute of Nanoscience and MaX Center, Italy
MaX  Materials design at the Exascale
http://www.nano.cnr.it
http://www.maxcentre.eu/
S3CNR Institute of Nanoscience and MaX Center, Italy
MaX  Materials design at the Exascale
http://www.nano.cnr.it
http://www.maxcentre.eu/

 Posts: 12
 Joined: Sat Jun 06, 2020 10:43 am
Re: Error on Allocation of X_mat for parallel computation
Dear Daniele,
Thank you for your reply!
Yeah I thought about using the plasmon pole approximation, but some of the chains that I have to study are metals and I read that the PP approximation fails in that case. Am I wrong?
Anyway, I will try to adjust the parameters you told me and I will let you know!
Thanks,
Davide

PhD student in Physics XXXIII cycle
Representative of the PhD students in Physics
Applied Science and Technology department (DiSAT)
Politecnico di Torino
Corso Duca degli Abruzzi, 24
10129 Torino ITALY

Thank you for your reply!
Yeah I thought about using the plasmon pole approximation, but some of the chains that I have to study are metals and I read that the PP approximation fails in that case. Am I wrong?
Anyway, I will try to adjust the parameters you told me and I will let you know!
Thanks,
Davide

PhD student in Physics XXXIII cycle
Representative of the PhD students in Physics
Applied Science and Technology department (DiSAT)
Politecnico di Torino
Corso Duca degli Abruzzi, 24
10129 Torino ITALY
