NetCDF: Start+count exceeds ... ERROR MESSAGE

Various technical topics such as parallelism and efficiency, netCDF problems, the Yambo code structure itself, are posted here.

Moderators: Daniele Varsano, andrea.ferretti, andrea marini, Conor Hogan, myrta gruning

NetCDF: Start+count exceeds ... ERROR MESSAGE

Postby RainerRutka » Fri Jul 10, 2015 12:53 pm

HI!
I'm an adimistrator for our University HPC-System and was told to set up Yambo 4.0.1 for our UniversalCluster.
AND: I'm not a scientist so I'm not familiar with Yambo at all :)

Unfortunately I get this error-message when I try to run the MPI-example you supplied at your homepage:

Code: Select all
 <01s> [01] CPU structure, Files & I/O Directories
 <01s> [02] CORE Variables Setup
 <01s> [02.01] Unit cells
 <01s> [02.02] Symmetries
 <01s> [02.03] RL shells
 <01s> [02.04] K-grid lattice
 <01s> [02.05] Energies [ev] & Occupations
 <01s> [03] Transferred momenta grid
 <01s> [04] External corrections
 <02s> [05] Dynamic Dielectric Matrix (PPA)
 <03s> [M  0.203 Gb] Alloc WF ( 0.188)
 <03s> [WF] Performing Wave-Functions I/O
  <04s> [M  0.203 Gb] Free wf_disk ( 0.011)
 <04s> [X-CG] R(p) Tot o/o(of R)  :  1925   3750    100
 <04s> Xo@q[1] |                                        | [000%] --(E) --(X)
 <09s> Xo@q[1] |#############################           | [073%] 05s(E) 06s(X)
 <11s> Xo@q[1] |########################################| [100%] 06s(E) 06s(X)
 <11s> X@q[1] |                                        | [000%] --(E) --(X)
 <11s> X@q[1] |########################################| [100%] --(E) --(X)
 <12s> [M  0.010 Gb] Free WF ( 0.188)
 <12s> [06] Bare local and non-local Exchange-Correlation
 <13s> [M  0.107 Gb] Alloc WF ( 0.097)
 <13s> [WF] Performing Wave-Functions I/O
 <13s> [FFT-HF/Rho] Mesh size:  75   75   12
 <13s> [M  0.119 Gb] Alloc wf_disk ( 0.011)
 <13s> [M  0.108 Gb] Free wf_disk ( 0.011)
 <13s> EXS |                                        | [000%] --(E) --(X)
 <18s> EXS |#########                               | [023%] 05s(E) 21s(X)
 <23s> EXS |###################                     | [047%] 10s(E) 20s(X)
 <28s> EXS |############################            | [071%] 15s(E) 21s(X)
 <33s> EXS |######################################  | [095%] 20s(E) 20s(X)
 <34s> EXS |########################################| [100%] 20s(E) 20s(X)
 <34s> [xc] Functional Slater exchange(X)+Perdew & Zunger(C)
 <34s> [xc] LIBXC used to calculate xc functional
 <34s> [M  0.010 Gb] Free WF ( 0.097)
 <35s> [06.01] HF occupations report
 <35s> [07] Dyson equation: Newton solver
 <35s> [07.01] G0W0 : the P(lasmon) P(ole) A(pproximation)
 <36s> [M  0.198 Gb] Alloc WF ( 0.188)
 <36s> [WF] Performing Wave-Functions I/O
 <36s> [FFT-GW] Mesh size:  54   54    9
 <36s> [M  0.209 Gb] Alloc wf_disk ( 0.011)
 <37s> [M  0.198 Gb] Free wf_disk ( 0.011)
 <37s> G0W0 PPA |                                        | [000%] --(E) --(X)
 <43s> G0W0 PPA |#                                       | [002%] 06s(E) 04m-07s(X)
 <49s> G0W0 PPA |##                                      | [005%] 12s(E) 04m-05s(X)
[ERROR] STOP signal received while in :[07.01] G0W0 : the P(lasmon) P(ole) A(pproximation)
[ERROR][NetCDF] NetCDF: Start+count exceeds dimension bound


[ERROR] STOP signal received while in :[07.01] G0W0 : the P(lasmon) P(ole) A(pproximation)
[ERROR][NetCDF] NetCDF: Start+count exceeds dimension bound[/code]


Here's an excerpt of my build-script:

Code: Select all
[...]
module load compiler/intel/14.0
module load mpi/openmpi/1.8-intel-14.0
module load numlib/mkl/11.1.4
module load lib/netcdf/4.4.2_fortran-openmpi-1.8-intel-14.0
./configure --enable-dp --enable-openmpi --enable-netcdf --enable-netcdf-LFS --with-blas-libs="-lmkl_intel_lp64 -lmkl_sequential -lmkl_core" --with-lapack-libs="-lmkl_intel_lp64 -lmkl_sequential -lmkl_core" --prefix=${TARGET_DIR}/bin 2>&1 | tee ${LOG_DIR}/configure.out
make all 2>&1 | tee ${LOG_DIR}/make_all.out     # approx. 24 min.
[...]


And hers a snipset of the submit-script for MOAB (MPI):

Code: Select all
 [...]
echo " "
echo "### initializing yambo db..."
# "Number of cores allocated to job: $MOAB_PROCCOUNT"
yambo
mpiexec -quiet -n ${MOAB_PROCCOUNT} yambo -F Inputs/02_QP_PPA_pure-mpi-q -J 02_QP_PPA_pure-mpi-q
[ "$?" -eq 0 ] && echo "all clean..." || echo "ERROR!"


I tried to use this example:

http://www.yambo-code.org/tutorials/Parallel/index.php

BTW: There's a dead-link here: [01] Initialization: 01_init (yambo -i -V RL)

ANY IDEAS WHAT I DID WRONG?

:-)
Last edited by RainerRutka on Tue Jul 14, 2015 12:43 pm, edited 1 time in total.
Rainer Rutka
University of Konstanz
Communication, Information, Media Centre (KIM)
High-Performance-Computing (HPC) [Room V511]
78457 Konstanz, Germany
+49 7531 88-5413
User avatar
RainerRutka
 
Posts: 8
Joined: Fri Jul 10, 2015 10:23 am
Location: University of Konstanz, Germany

Re: NetCDF: Start+count exceeds ... ERROR MESSAGE

Postby RainerRutka » Tue Jul 14, 2015 10:20 am

OK, no solution so far.
So I'll compile Yambo without MPI! :-(
Rainer Rutka
University of Konstanz
Communication, Information, Media Centre (KIM)
High-Performance-Computing (HPC) [Room V511]
78457 Konstanz, Germany
+49 7531 88-5413
User avatar
RainerRutka
 
Posts: 8
Joined: Fri Jul 10, 2015 10:23 am
Location: University of Konstanz, Germany

Re: NetCDF: Start+count exceeds ... ERROR MESSAGE

Postby Daniele Varsano » Tue Jul 14, 2015 10:29 am

Dear Rainer,

can you please post your complete input/report file and the config.log as well?
I do not know if the problem is related with the parallelization or to the netcdf libs, which version of netcdf are you using.
And is the ${MOAB_PROCCOUNT} variable value compatible with the parallelization in input file?
Please post the files and we will have a look.

Please fill your signature with your affiliation, this is a rule of the forum.

Best,

Daniele
Dr. Daniele Varsano
S3-CNR Institute of Nanoscience and MaX Center, Italy
MaX - Materials design at the Exascale
http://www.nano.cnr.it
http://www.max-centre.eu/
User avatar
Daniele Varsano
 
Posts: 2060
Joined: Tue Mar 17, 2009 2:23 pm

Re: NetCDF: Start+count exceeds ... ERROR MESSAGE

Postby RainerRutka » Tue Jul 14, 2015 12:48 pm

Hi Daniele!
If i try to submit a file, I got 'extension not allowed'.
Doesn't matter I use *.taz, *.tar.z oder xxxxnnnn_taz or anything else.
.--)(
Rainer Rutka
University of Konstanz
Communication, Information, Media Centre (KIM)
High-Performance-Computing (HPC) [Room V511]
78457 Konstanz, Germany
+49 7531 88-5413
User avatar
RainerRutka
 
Posts: 8
Joined: Fri Jul 10, 2015 10:23 am
Location: University of Konstanz, Germany

Re: NetCDF: Start+count exceeds ... ERROR MESSAGE

Postby Daniele Varsano » Tue Jul 14, 2015 1:40 pm

Hi Rainer,
.tar.gz should be allowed. Otherwise just rename them as .txt

Daniele
Dr. Daniele Varsano
S3-CNR Institute of Nanoscience and MaX Center, Italy
MaX - Materials design at the Exascale
http://www.nano.cnr.it
http://www.max-centre.eu/
User avatar
Daniele Varsano
 
Posts: 2060
Joined: Tue Mar 17, 2009 2:23 pm

Re: NetCDF: Start+count exceeds ... ERROR MESSAGE

Postby RainerRutka » Tue Jul 14, 2015 1:52 pm

Hi Daniele!

Here are the requested files as an compressed tar archive.

https://depot.uni-konstanz.de/cgi-bin/exchange.pl?g=pk2vdz8n86

Grazie mille!
Rainer Rutka
University of Konstanz
Communication, Information, Media Centre (KIM)
High-Performance-Computing (HPC) [Room V511]
78457 Konstanz, Germany
+49 7531 88-5413
User avatar
RainerRutka
 
Posts: 8
Joined: Fri Jul 10, 2015 10:23 am
Location: University of Konstanz, Germany

Re: NetCDF: Start+count exceeds ... ERROR MESSAGE

Postby Daniele Varsano » Tue Jul 14, 2015 2:08 pm

Dear Rainer,
a possible reason for the crash is in the mismatch between the input parallelization and the required resource:
It looks you are asking for 16 cpu:
Code: Select all
nodes=2:ppn=8

as seen also for the LOG files,
while the input file is meant for a pure MPI run with 8 cpu.
Code: Select all
X_all_q_ROLEs= "q k c v"            # [PARALLEL] CPUs roles (q,k,c,v)
X_all_q_CPU= "8 1 1 1"              # Parallelism over q points only 
SE_ROLEs= "q qp b"
SE_CPU= "8 1 1"                     # Parallilism over q points only


Hopefully just asking for 8 cpu could solve the problem.

Best,

Daniele
Dr. Daniele Varsano
S3-CNR Institute of Nanoscience and MaX Center, Italy
MaX - Materials design at the Exascale
http://www.nano.cnr.it
http://www.max-centre.eu/
User avatar
Daniele Varsano
 
Posts: 2060
Joined: Tue Mar 17, 2009 2:23 pm

Re: NetCDF: Start+count exceeds ... ERROR MESSAGE

Postby RainerRutka » Tue Jul 14, 2015 2:37 pm

HI!

Example with 8 cpus


Job name: yambo_job
Number of nodes allocated to job: 2
Number of cores allocated to job: 8

Code: Select all
[...]
#MSUB -l nodes=2:ppn=4          # 2 nodes a´ 4 processors/node = 8 cpus
[...]


so I try to run Yambo that way ($MOAB_PROCCOUNT = 8)

Code: Select all
[...]
mpiexec -n ${MOAB_PROCCOUNT} yambo -F Inputs/02_QP_PPA_pure-mpi-q -J 02_QP_PPA_pure-mpi-q
[...]


---------------------------------------------------------------------------------------------------------------------------

Code: Select all
./LOG:
insgesamt 36
-rw-r--r--. 1 kn_pop235844 kn_kn 5204 14. Jul 15:33 l-02_QP_PPA_pure-mpi-q_em1d_ppa_HF_and_locXC_gw0_CPU_1
-rw-r--r--. 1 kn_pop235844 kn_kn 3522 14. Jul 15:32 l-02_QP_PPA_pure-mpi-q_em1d_ppa_HF_and_locXC_gw0_CPU_2
-rw-r--r--. 1 kn_pop235844 kn_kn 3522 14. Jul 15:32 l-02_QP_PPA_pure-mpi-q_em1d_ppa_HF_and_locXC_gw0_CPU_3
-rw-r--r--. 1 kn_pop235844 kn_kn 3522 14. Jul 15:32 l-02_QP_PPA_pure-mpi-q_em1d_ppa_HF_and_locXC_gw0_CPU_4
-rw-r--r--. 1 kn_pop235844 kn_kn 3522 14. Jul 15:32 l-02_QP_PPA_pure-mpi-q_em1d_ppa_HF_and_locXC_gw0_CPU_5
-rw-r--r--. 1 kn_pop235844 kn_kn 3523 14. Jul 15:32 l-02_QP_PPA_pure-mpi-q_em1d_ppa_HF_and_locXC_gw0_CPU_6
-rw-r--r--. 1 kn_pop235844 kn_kn 3523 14. Jul 15:32 l-02_QP_PPA_pure-mpi-q_em1d_ppa_HF_and_locXC_gw0_CPU_7
-rw-r--r--. 1 kn_pop235844 kn_kn 3523 14. Jul 15:32 l-02_QP_PPA_pure-mpi-q_em1d_ppa_HF_and_locXC_gw0_CPU_8


Same error

BZ energy Double Grid :no
BZ energy DbGd points :0
PPA Im energy [ev]: 27.21138
- S/N 007316 --------------------------- v.04.00.01 r.0088 -
[ERROR] STOP signal received while in :[07.01] G0W0 : the P(lasmon) P(ole) A(pproximation)
[ERROR][NetCDF] NetCDF: Start+count exceeds dimension bound


:-(

Thanks, Rainer







---------------------------------------------------------------------------------------------------------------------------
Rainer Rutka
University of Konstanz
Communication, Information, Media Centre (KIM)
High-Performance-Computing (HPC) [Room V511]
78457 Konstanz, Germany
+49 7531 88-5413
User avatar
RainerRutka
 
Posts: 8
Joined: Fri Jul 10, 2015 10:23 am
Location: University of Konstanz, Germany

Re: NetCDF: Start+count exceeds ... ERROR MESSAGE

Postby Daniele Varsano » Tue Jul 14, 2015 2:46 pm

Dear Rainer,
did you removed the 02_QP_PPA_pure-mpi-q directory before running the job?
Can you also post the report/log files of the last run?
Daniele
Dr. Daniele Varsano
S3-CNR Institute of Nanoscience and MaX Center, Italy
MaX - Materials design at the Exascale
http://www.nano.cnr.it
http://www.max-centre.eu/
User avatar
Daniele Varsano
 
Posts: 2060
Joined: Tue Mar 17, 2009 2:23 pm

Re: NetCDF: Start+count exceeds ... ERROR MESSAGE

Postby RainerRutka » Tue Jul 14, 2015 4:00 pm

Hi Daniele!

did you remove the 02_QP_PPA_pure-mpi-q directory before running the job?

No I didn't. If the input files are present I get the errors.

If I remove the "02_QP_PPA_pure-mpi-q"-FILE only (its a file not a directory), i get

UC:[kn_pop235844@uc1n996 bwhpc-examples]$ tail -20 r-02_QP_PPA_pure-mpi-q_setup

Code: Select all
[...]
 .-ACKNOWLEDGMENT
 |
 | The users of YAMBO have little formal obligations with respect to
 | the YAMBO group (those specified in the GNU General Public
 | License, http://www.gnu.org/copyleft/gpl.txt). However, it is
 | common practice in the scientific literature, to acknowledge the
 | efforts of people that have made the research possible. In this
 | spirit, please find below the reference we kindly ask you to use
 | in order to acknowledge YAMBO:
 |
 | Yambo: An ab initio tool for excited state calculations
 | A. Marini, C. Hogan, M. Gr"uning, D. Varsano
 | Computer Physics Communications  180, 1392 (2009).
 |
 
 .-Input file : (none)


.-Input file: (none)
And no RESULTS.

Verify here:

Code: Select all
UC:[kn_pop235844@uc1n996 YAMBO]$ pwd && ls -l Inputs
/opt/bwhpc/common/phys/yambo/4.0.1/YAMBO_TUTORIALS/Parallel/YAMBO
insgesamt 32
-rw-r--r--. 1 kn_pop235844 uc1-adm-sw  742 15. Apr 15:40 01_init
-rw-r--r--. 1 kn_pop235844 uc1-adm-sw 2481 15. Apr 15:40 02_QP_PPA_pure-mpi-comb
-rw-r--r--. 1 kn_pop235844 uc1-adm-sw 2481 15. Apr 15:40 02_QP_PPA_pure-mpi-cvb
-rw-r--r--. 1 kn_pop235844 uc1-adm-sw 2481 15. Apr 15:40 02_QP_PPA_pure-mpi-k
-rw-r--r--. 1 kn_pop235844 uc1-adm-sw 2481 15. Apr 15:40 02_QP_PPA_pure-mpi-q
-rw-r--r--. 1 kn_pop235844 uc1-adm-sw 2481 15. Apr 15:40 03_QP_PPA_pure-omp
-rw-r--r--. 1 kn_pop235844 uc1-adm-sw 2480 15. Apr 15:40 03_QP_PPA_pure-omp-scaling
-rw-r--r--. 1 kn_pop235844 uc1-adm-sw 2481 15. Apr 15:40 04_QP_PPA_hyb-mpi-omp
Rainer Rutka
University of Konstanz
Communication, Information, Media Centre (KIM)
High-Performance-Computing (HPC) [Room V511]
78457 Konstanz, Germany
+49 7531 88-5413
User avatar
RainerRutka
 
Posts: 8
Joined: Fri Jul 10, 2015 10:23 am
Location: University of Konstanz, Germany

Next

Return to Technical Issues

Who is online

Users browsing this forum: No registered users and 1 guest