NetCDF: Unknown file format

Various technical topics such as parallelism and efficiency, netCDF problems, the Yambo code structure itself, are posted here.

Moderators: Daniele Varsano, andrea.ferretti, andrea marini, Conor Hogan, myrta gruning

fabio.caruso
Posts: 9
Joined: Fri Jul 18, 2014 5:10 pm

NetCDF: Unknown file format

Post by fabio.caruso » Tue Jul 28, 2015 3:07 pm

Dear Yambo Developers,

I am having some problems in running Yambo (v4.0.1 r.88) with Netcdf libraries on a Cray machine.
I have successfully configured and compiled the code by linking with the cray-optimized Netcdf and HDF5 libraries. In particular, after running the configure script with the machine-specific flags, I get the following:

Code: Select all

#
# [VER] 4.0.1 r.88
#
# - GENERAL CONFIGURATIONS -
#
# [SYS] linux@x86_64
# [SRC] /home/e411/e411/fcaruso/espresso-5.1.2/yambo-4.0.1-epl
# [BIN] /home/e411/e411/fcaruso/espresso-5.1.2/yambo-4.0.1-epl/bin
# [-] Double precision
# [X] Redundant compilation 
# [-] Run-Time timing profile
#
# - PARALLEL SUPPORT -
#
# [X] MPI (open-mpi kind)
# [-] OpenMP
# [-] Blue-Gene specific procedures
#
# - LIBRARIES (E=external library; I=internal library; -=not used;) -
#
#  I/O
# [ E ] IOTK   : /home/e411/e411/fcaruso/espresso-5.1.2/iotk//src/libiotk.a (QE 5.0)
# [ - ] ETSF_IO: 
# [ E ] NETCDF : -L/opt/cray/netcdf-hdf5parallel/4.3.2/INTEL/140//lib -lnetcdff -lnetcdf (No large files support)
# [ E ] HDF5   : -L/opt/cray/hdf5-parallel/1.8.13/INTEL/140//lib -L/opt/cray/hdf5-parallel/1.8.13/INTEL/140//lib -lhdf5_fortran -lhdf5_hl -lhdf5 (No specific HDF5-IO support)
#
#  MATH
# [ E ] FFT      : -L/opt/cray/fftw/3.3.4.2/sandybridge//lib -lfftw3 (FFTW v3)
# [ E ] BLAS     : -L/opt/intel/composer_xe_2013_sp1.4.211/mkl/lib/intel64/ -lmkl_intel_lp64  -lmkl_sequential -lmkl_core 
# [ E ] LAPACK   : -L/opt/intel/composer_xe_2013_sp1.4.211/mkl/lib/intel64/ -Wl,--start-group -lmkl_intel_lp64  -lmkl_sequential -lmkl_core -Wl,--end-group -ldl
# [ E ] SCALAPACK: -L/opt/intel/composer_xe_2013_sp1.4.211/mkl/lib/intel64/ -lmkl_scalapack_lp64
#
#  OTHER
# [ I ] LibXC      : -lxc
# [ - ] MPI library: 
#
# - COMPILERS, MAKE and EDITOR -
#
# [ CPP ] cc -E -ansi -D_NETCDF_IO -D_MPI -D_FFTW -D_FFTW_OMP -D_SCALAPACK      -D_OPENMPI 
# [  C  ] cc -g -O2  -D_C_US -D_FORTRAN_US
# [MPICC] cc -g -O2  -D_C_US -D_FORTRAN_US
# [ F90 ] ftn -assume bscc -O2 -static -ip -nofor\_main  
# [MPIF ] ftn -assume bscc -O2 -static -ip -nofor\_main  
# [ F77 ] ftn -assume bscc -O2 -static -ip -nofor\_main
# [Cmain] -Mnomain
# [NoOpt]  -assume bscc -O0 -static -nofor\_main
#
# [ MAKE ] make
# [EDITOR] vim
#
"make yambo interfaces" generates the executables without problems.

However, when I run yambo for calculating the PPA quasi-particle corrections for Silicon I systematically run into the following error:

P0002: [ERROR] STOP signal received while in :[07] Dynamic Dielectric Matrix (PPA)
P0002: [ERROR][NetCDF] NetCDF: Unknown file format

This occurs whenever yambo finishes the computation of the first q point of the polarizability.
Additionally, the calculation does not crash and it keeps occupying the nodes of the cluster unless I delete the job manually.
I attach the input/output files of the calculation.

Any idea of the origin of this problem?

Thanks in advance for your help!

Best,
Fabio
You do not have the required permissions to view the files attached to this post.
Fabio Caruso
Department of Materials
University of Oxford
Parks Road
Oxford, OX1 3PH, UK

User avatar
Daniele Varsano
Posts: 2097
Joined: Tue Mar 17, 2009 2:23 pm
Contact:

Re: NetCDF: Unknown file format

Post by Daniele Varsano » Tue Jul 28, 2015 4:02 pm

Dear Fabio,
I do not know if is related to the NETDCF you linked.
Can you try to see if the problem persists when using a different parallelization strategy? (i.e. not all the cpu over the q's, or better avoiding to parallelize over q's)

Best,
Daniele
Dr. Daniele Varsano
S3-CNR Institute of Nanoscience and MaX Center, Italy
MaX - Materials design at the Exascale
http://www.nano.cnr.it
http://www.max-centre.eu/

andrea.ferretti
Posts: 102
Joined: Fri Jan 31, 2014 11:13 am

Re: NetCDF: Unknown file format

Post by andrea.ferretti » Tue Jul 28, 2015 4:36 pm

Hi Fabio,

actually this looks pretty much an issue we are investigating at the moment
(which is triggered by the q-parallelism in the calculation of X).

For the time being, as Daniele suggested, I would just avoid the q parallelism.
We'll let you know when the problem is fixed

ciao ciao
Andrea
Andrea Ferretti, PhD
CNR-NANO-S3 and MaX Centre
via Campi 213/A, 41125, Modena, Italy
Tel: +39 059 2055322; Skype: andrea_ferretti
URL: http://www.nano.cnr.it

fabio.caruso
Posts: 9
Joined: Fri Jul 18, 2014 5:10 pm

Re: NetCDF: Unknown file format

Post by fabio.caruso » Tue Jul 28, 2015 5:03 pm

Dear Daniele and Andrea,

thanks for your reply! The problem disappeared when I turned off the q-parallelization, as you said.
(However, the problem with the q-parallelization does not seem to depend on the libraries I have linked. I tried to reconfigure/recompile a different version of the netcdf and hdf5 libs available on Cray and the issue persists.)

Thanks a lot for your help (and for the amazing work on Yambo)!

Best,
Fabio
Fabio Caruso
Department of Materials
University of Oxford
Parks Road
Oxford, OX1 3PH, UK

andrea.ferretti
Posts: 102
Joined: Fri Jan 31, 2014 11:13 am

Re: NetCDF: Unknown file format

Post by andrea.ferretti » Tue Jul 28, 2015 5:07 pm

Hi Fabio,

thanks for reporting.
Indeed, the problem seems to be related to MPI communicators or alike, and just shows up related to IO, according to our experience, in a non-reproducible way (something like two tasks trying to write on the same file at the same moment)

take care
Andrea
Andrea Ferretti, PhD
CNR-NANO-S3 and MaX Centre
via Campi 213/A, 41125, Modena, Italy
Tel: +39 059 2055322; Skype: andrea_ferretti
URL: http://www.nano.cnr.it

jmullen
Posts: 29
Joined: Wed Apr 01, 2009 6:29 pm

Re: NetCDF: Unknown file format

Post by jmullen » Mon Aug 17, 2015 12:49 am

Hello,

I am experiencing the same problem in a very unpredictable way. Unfortunately, it is happening for well over half of my runs. I read the thread here and I do not think I am parallelizing over q points. I am following the descriptions in the tutorials for the cvb file example (which runs fine) in the YAMBO_TUTORIAL directory.

For 16 processors, I am using the following

X_all_q_ROLEs= "q k c v" # [PARALLEL] CPUs roles (q,k,c,v)
X_all_q_CPU= "1 1 4 4" # [PARALLEL] CPUs for each role
SE_ROLEs= "q qp b" # [PARALLEL] CPUs roles (q,qp,b)
SE_CPU= "1 1 16" # [PARALLEL] CPUs for each role
X_all_q_nCPU_invert=0 # [PARALLEL] CPUs for matrix inversion

Keeping that configuration, but changing various parameters for convergence (EXXRLvcs for example) I do not have a very high success rate for run completion, let alone actual meaningful convergence.

I don't really have a question here so much as contributing my observations to the discussion.

Version: Version 4.0.1 Revision 88
Run parameters: have run with both -S and without -S
System: MoS2

Regards
Jeff Mullen
NCSU
Jeff Mullen
NCSU Physics

User avatar
Daniele Varsano
Posts: 2097
Joined: Tue Mar 17, 2009 2:23 pm
Contact:

Re: NetCDF: Unknown file format

Post by Daniele Varsano » Mon Aug 17, 2015 7:13 am

Dear Jeff
Thank you very much for reporting. We will investigate this soon very deeply.
Best,
Daniele
Dr. Daniele Varsano
S3-CNR Institute of Nanoscience and MaX Center, Italy
MaX - Materials design at the Exascale
http://www.nano.cnr.it
http://www.max-centre.eu/

andrea.ferretti
Posts: 102
Joined: Fri Jan 31, 2014 11:13 am

Re: NetCDF: Unknown file format

Post by andrea.ferretti » Mon Aug 17, 2015 10:07 am

Dear Jeff,

thanks for reporting about this issue.
Could you please add some more details to help figuring out what is going on ?
In particular: some relevant log files, input and report files, config.log and any other info
you deem relevant

thank you
Andrea
Andrea Ferretti, PhD
CNR-NANO-S3 and MaX Centre
via Campi 213/A, 41125, Modena, Italy
Tel: +39 059 2055322; Skype: andrea_ferretti
URL: http://www.nano.cnr.it

hlee
Posts: 29
Joined: Mon Jul 15, 2013 2:09 pm

Re: NetCDF: Unknown file format

Post by hlee » Mon Aug 17, 2015 11:45 am

Dear all:

I had the same problem irrespective of use of q parallelization and I worked around this problem just by adding the line, DBsIOoff= "DIP", in the input file.
Although oscillator strengths are not stored in the databases, I think that this is one of reasonable workarounds at the moment.

Sincerely,
Dr. Hyungjun Lee
Institute of Theoretical Physics, EPFL

jmullen
Posts: 29
Joined: Wed Apr 01, 2009 6:29 pm

Re: NetCDF: Unknown file format

Post by jmullen » Mon Aug 17, 2015 1:46 pm

Hello,

I am attaching the LOG directory, the sequence of commands I run (cmds.sh) and the input file I am using for testing. I am sure you know this, but this file is not a converged system. I created a run with very few k points to experiment with the problem (NetCDF). This is one input file variation of about 50 I have tried over the last week - large permutation of parallelization variables. I am not and have not tried to parallelize over q, qp, etc., only the c, v, and b roles as the tutorial suggests this is the combination I want to lower the memory/processor.

I do not have the config.log as I did not compile the code, our HPC admins did. I can try to get the config.log if this doesn't help.

And finally, I did not post the SAVE directory due to its size (2.6G). If anyone requires that, and this application will allow me to upload it, I will.

Thanks,
Jeff Mullen
NCSU
You do not have the required permissions to view the files attached to this post.
Jeff Mullen
NCSU Physics

Post Reply