YAMBO parallel for large system

Run-time issues concerning Yambo that are not covered in the above forums.

Moderators: Daniele Varsano, andrea marini, Conor Hogan, myrta gruning

YAMBO parallel for large system

Postby jyin002 » Wed Sep 14, 2016 8:22 am

Dear YAMBO developers,
I was trying to use GW0 method to calculate the corrected electronic bands of solids implemented in YAMBO (v 3.4.2). Please check the GW input file as below:
gw0 # [R GW] GoWo Quasiparticle energy levels
ppa # [R Xp] Plasmon Pole Approximation
em1d # [R Xd] Dynamical Inverse Dielectric Matrix
HF_and_locXC # [R XX] Hartree-Fock Self-energy and Vxc
EXXRLvcs= 20 Ry # [XX] Exchange RL components
Chimod= "Hartree" # [X] IP/Hartree/ALDA/LRC/BSfxc
% QpntsRXp
1 | 34 | # [Xp] Transferred momenta
%
% BndsRnXp
1 | 280 | # [Xp] Polarization function bands
%
NGsBlkXp= 1 Ry # [Xp] Response block size
% LongDrXp
1.000000 | 0.000000 | 0.000000 | # [Xp] [cc] Electric Field
%
PPAPntXp= 27.21138 eV # [Xp] PPA imaginary energy
% GbndRnge
1 | 280 | # [GW] G[W] bands range
%
GDamping= 0.100000 eV # [GW] G[W] damping
dScStep= 0.100000 eV # [GW] Energy step to evalute Z factors
DysSolver= "n" # [GW] Dyson Equation solver (`n`,`s`,`g`)
%QPkrange # [GW] QP generalized Kpoint/Band indices
1| 1| 210| 222|
%
%QPerange # [GW] QP generalized Kpoint/Energy indices
1| 34| 0.0|-1.0|
%


The job works well only if I use small number of the cores (e.g 8 or less) on HPC clusters:
srun --ntasks=8 --hint=nomultithread --ntasks-per-node=8 --ntasks-per-socket=4 --ntasks-per-core=1 --mem_bind=v,local ${YAMBO_HOME}/bin/yambo -F INPUTS/06_BSE -J 06_BSE

But if I increased the number of cores, like 32 (one node) or even more, it always stopped like
<---> [01] Files & I/O Directories
<---> [02] CORE Variables Setup
<---> [02.01] Unit cells
<01s> [02.02] Symmetries
<01s> [02.03] RL shells
<01s> [02.04] K-grid lattice
<01s> [02.05] Energies [ev] & Occupations
<01s> [03] Transferred momenta grid
<01s> [04] Bare local and non-local Exchange-Correlation
<01s> [Distribute] Average allocated memory is [o/o]: 7.502401
<01s> [M 0.773 Gb] Alloc WF ( 0.721)
<02s> [FFT-HF/Rho] Mesh size: 30 30 95
<02s> [WF-HF/Rho loader] Wfs (re)loading | | [000%] --(E) --(X)
<02s> [M 0.996 Gb] Alloc wf_disk ( 0.222)
<08s> [WF-HF/Rho loader] Wfs (re)loading |# | [009%] 05s(E) 58s(X)
<14s> [WF-HF/Rho loader] Wfs (re)loading |#### | [020%] 11s(E) 56s(X)
<19s> [WF-HF/Rho loader] Wfs (re)loading |###### | [032%] 17s(E) 53s(X)
<25s> [WF-HF/Rho loader] Wfs (re)loading |######## | [043%] 23s(E) 52s(X)
<31s> [WF-HF/Rho loader] Wfs (re)loading |########### | [055%] 28s(E) 51s(X)
<37s> [WF-HF/Rho loader] Wfs (re)loading |############# | [067%] 34s(E) 50s(X)
<42s> [WF-HF/Rho loader] Wfs (re)loading |############### | [079%] 40s(E) 50s(X)
<48s> [WF-HF/Rho loader] Wfs (re)loading |################## | [091%] 46s(E) 50s(X)
<51s> [WF-HF/Rho loader] Wfs (re)loading |####################| [100%] 49s(E) 49s(X)
<51s> [M 0.775 Gb] Free wf_disk ( 0.222)
<51s> EXS | | [000%] --(E) --(X)
<56s> P001: EXS |### | [016%] 05s(E) 29s(X)
<01m-01s> P001: EXS |###### | [033%] 10s(E) 29s(X)
<01m-06s> P001: EXS |########## | [050%] 15s(E) 29s(X)
<01m-11s> P001: EXS |############# | [067%] 20s(E) 29s(X)
<01m-16s> P001: EXS |################ | [084%] 25s(E) 29s(X)
<01m-20s> P001: EXS |####################| [100%] 28s(E) 28s(X)
<01m-20s> [xc] Functional Perdew, Burke & Ernzerhof(X)+Perdew, Burke & Ernzerhof(C)
<01m-20s> [xc] LIBXC used to calculate xc functional
<01m-20s> [M 0.052 Gb] Free WF ( 0.721)
<01m-21s> [05] Dynamic Dielectric Matrix (PPA)
<01m-21s> [Distribute] Average allocated memory is [o/o]: 77.85714


It seems that increasing numbers of cores doesn’t accelerate the calculations. I would like to know how to handle the large system with more than one thousand cores and run the jobs parallel more efficiently.
Jun Yin
Nanyang Technological University
jyin002
 
Posts: 2
Joined: Thu Mar 12, 2015 7:56 am

Re: YAMBO parallel for large system

Postby Daniele Varsano » Wed Sep 14, 2016 8:35 am

Dear Jun Yin,

in order to use efficiently a large number of cores you need to update to the 4.x release where the parallelism has been totally revised.
In the 4.x you have flexibility on the parallelization strategy: you can have a look to a simple tutorial here:
http://www.yambo-code.org/tutorials/Parallel/index.php

In order to activate the variable governing the parallelism you need to add "-V par" in the command line to build up the input files.

Best,

Daniele
Dr. Daniele Varsano
S3-CNR Institute of Nanoscience and MaX Center, Italy
MaX - Materials design at the Exascale
http://www.nano.cnr.it
http://www.max-centre.eu/
User avatar
Daniele Varsano
 
Posts: 1948
Joined: Tue Mar 17, 2009 2:23 pm

Re: YAMBO parallel for large system

Postby jyin002 » Wed Sep 14, 2016 9:32 am

Dear Daniele,

Thanks a lot for your suggestions. I just tried the same calculation (2 nodes, 64 cores) with YAMBO 4.0.2 by put extra lines in GW input file:

X_all_q_ROLEs= "q k c v" # [PARALLEL] CPUs roles (q,k,c,v)
X_all_q_CPU= "2 4 4 2" # Parallelism over q points only
SE_ROLEs= "q qp b"
SE_CPU= "2 8 2" # Parallilism over q points only


It stopped abnormally and one of file in LOG folder shows:

<01s> P0001: [01] CPU structure, Files & I/O Directories
<01s> P0001: CPU-Threads:64(CPU)-1(threads)-1(threads@X)-1(threads@DIP)-1(threads@SE)-1(threads@RT)-1(threads@K)
<01s> P0001: CPU-Threads:X_all_q(environment)-2 4 4 2(CPUs)-q k c v(ROLEs)
<01s> P0001: CPU-Threads:SE(environment)-2 8 2(CPUs)-q qp b(ROLEs)
<01s> P0001: [02] CORE Variables Setup
<01s> P0001: [02.01] Unit cells
<02s> P0001: [02.02] Symmetries
<02s> P0001: [02.03] RL shells
<02s> P0001: [02.04] K-grid lattice
<02s> P0001: [02.05] Energies [ev] & Occupations
<02s> P0001: [03] Transferred momenta grid
<02s> P0001: [M 0.052 Gb] Alloc bare_qpg ( 0.020)
<02s> P0001: [04] External corrections
<03s> P0001: [05] Dynamic Dielectric Matrix (PPA)
<03s> P0001: [PARALLEL Response_G_space for K(bz) on 4 CPU] Loaded/Total (Percentual):16/64(25%)
<03s> P0001: [PARALLEL Response_G_space for Q(ibz) on 2 CPU] Loaded/Total (Percentual):17/34(50%)
<03s> P0001: [PARALLEL Response_G_space for K(ibz) on 1 CPU] Loaded/Total (Percentual):34/34(100%)
<03s> P0001: [PARALLEL Response_G_space for CON bands on 4 CPU] Loaded/Total (Percentual):16/64(25%)
<03s> P0001: [PARALLEL Response_G_space for VAL bands on 2 CPU] Loaded/Total (Percentual):108/216(50%)
<03s> P0001: Matrix Inversion uses 1 CPUs
<03s> P0001: Matrix Diagonalization uses 1 CPUs
<03s> P0001: [DIP] Checking dipoles header
<03s> P0001: [x,Vnl] computed using 1732 projectors
<03s> P0001: [WARNING] [x,Vnl] slows the Dipoles computation. To neglect it rename the ns.kb_pp file
<03s> P0001: [M 4.194 Gb] Alloc KBV ( 4.129)
<03s> P0001: [M 6.707 Gb] Alloc WF ( 2.513)
<03s> P0001: [PARALLEL distribution for Wave-Function states] Loaded/Total(Percentual):4216/9520(44%)
<05s> P0001: [WF] Performing Wave-Functions I/O from ./SAVE
<05s> P0001: [M 6.782 Gb] Alloc wf_disk ( 0.074)
<05s> P0001: Reading wf_fragments_1_1
<05s> P0001: Reading wf_fragments_1_2
<06s> P0001: Reading wf_fragments_1_3
<06s> P0001: Reading wf_fragments_2_1
<06s> P0001: Reading wf_fragments_2_2
<07s> P0001: Reading wf_fragments_2_3
<07s> P0001: Reading wf_fragments_3_1
<08s> P0001: Reading wf_fragments_3_2
<08s> P0001: Reading wf_fragments_3_3


Could you please help me figure out the problem? Thank you again.
Jun Yin
Nanyang Technological University
jyin002
 
Posts: 2
Joined: Thu Mar 12, 2015 7:56 am

Re: YAMBO parallel for large system

Postby Daniele Varsano » Wed Sep 14, 2016 9:48 am

Dear Jun,

in oder to spot the problem the complete input/report and error message would help.
Anyway I can see form your post that:
1) X_all_q_CPU and SE_CPU are inconsistent: (the first 64 cpus, the second 32 cpus)
2) You are allocating 6.782 Gb, check that you have such RAM per core
3) As a tip avoid parallelization on q, usually it results in unbalancing the calculations. So the q role =1
4) In order to distribute memory, try to parallelize on bands (c,v)

Best,

Daniele
Dr. Daniele Varsano
S3-CNR Institute of Nanoscience and MaX Center, Italy
MaX - Materials design at the Exascale
http://www.nano.cnr.it
http://www.max-centre.eu/
User avatar
Daniele Varsano
 
Posts: 1948
Joined: Tue Mar 17, 2009 2:23 pm


Return to Other issues

Who is online

Users browsing this forum: No registered users and 1 guest

cron