Segmentation fault when running hBN-2D-para example

Various technical topics such as parallelism and efficiency, netCDF problems, the Yambo code structure itself, are posted here.

Moderators: Daniele Varsano, andrea.ferretti, andrea marini, Conor Hogan, myrta gruning

Segmentation fault when running hBN-2D-para example

Postby lance xu » Sat May 26, 2018 6:28 am

Hi,
I am learning how to work with Yambo in a parallel environment following the hBN-2D-para example.
http://www.yambo-code.org/wiki/index.php?title=GW_parallel_strategies
While testing pure MPI scaling, using 1, 2, 4, and 16 MPI processes all yield seemingly good results. However, the simulation failed with 8 MPI processes. The error message is as shown.
Code: Select all
[cg17-4.agave.rc.asu.edu:mpi_rank_5][error_sighandler] Caught error: Segmentation fault (signal 11)
[cg17-4.agave.rc.asu.edu:mpi_rank_7][error_sighandler] Caught error: Segmentation fault (signal 11)
[cg17-4.agave.rc.asu.edu:mpi_rank_3][error_sighandler] Caught error: Segmentation fault (signal 11)
[cg17-4.agave.rc.asu.edu:mpi_rank_1][error_sighandler] Caught error: Segmentation fault (signal 11)
srun: error: cg17-4: tasks 1,3,5,7: Segmentation fault

I don't quite know why this only happens when 8 MPI processes are used. My Yambo (4.2.2) is compiled against mvapich2/2.3b, Intel 2018x, and QE 6.2.1. MKL is linked for BLACS, BLAS, LAPCAK, ScaLAPACK, and FFT. Other required libs are the internal ones.

Thank you very much!
Weiqing Xu
Weiqing Xu
Department of Physics,
Arizona State University, US
https://isearch.asu.edu/profile/2392919
lance xu
 
Posts: 4
Joined: Wed May 23, 2018 11:57 pm

Re: Segmentation fault when running hBN-2D-para example

Postby Daniele Varsano » Mon May 28, 2018 8:32 am

Dear Weiqing Xu,
that's sound quite strange. From the error you post we cannot say much, can you post in attachment your input, report and log files?
Thanks,

Daniele
Dr. Daniele Varsano
S3-CNR Institute of Nanoscience and MaX Center, Italy
MaX - Materials design at the Exascale
http://www.nano.cnr.it
http://www.max-centre.eu/
User avatar
Daniele Varsano
 
Posts: 2027
Joined: Tue Mar 17, 2009 2:23 pm

Re: Segmentation fault when running hBN-2D-para example

Postby lance xu » Mon May 28, 2018 6:14 pm

Hi Daniele,

The attached tarball contains my input, report, log, and some configuration information.
Thank you very much!

Weiqing Xu
You do not have the required permissions to view the files attached to this post.
Weiqing Xu
Department of Physics,
Arizona State University, US
https://isearch.asu.edu/profile/2392919
lance xu
 
Posts: 4
Joined: Wed May 23, 2018 11:57 pm

Re: Segmentation fault when running hBN-2D-para example

Postby Daniele Varsano » Mon May 28, 2018 6:40 pm

Dear Weiking,
I do suspect the problem is related to the cpu assigned to the linear algebra considering you are dealing with a very small matrix (NGsBlkXp= 4 RL ).
Note that in the tutorial the size of the screening matrix is set to 4 Ry and not 4 RL.

Best,
Daniele
Dr. Daniele Varsano
S3-CNR Institute of Nanoscience and MaX Center, Italy
MaX - Materials design at the Exascale
http://www.nano.cnr.it
http://www.max-centre.eu/
User avatar
Daniele Varsano
 
Posts: 2027
Joined: Tue Mar 17, 2009 2:23 pm

Re: Segmentation fault when running hBN-2D-para example

Postby lance xu » Mon May 28, 2018 7:13 pm

Hi Daniele,

Thank you for catching that! But the same error message still shows up even after I fix it.
Here are the new run and log files.

Weiqing
You do not have the required permissions to view the files attached to this post.
Weiqing Xu
Department of Physics,
Arizona State University, US
https://isearch.asu.edu/profile/2392919
lance xu
 
Posts: 4
Joined: Wed May 23, 2018 11:57 pm

Re: Segmentation fault when running hBN-2D-para example

Postby Daniele Varsano » Tue May 29, 2018 8:54 am

Dear Weiqing,
I will try to reproduce your problem, in the meanwhile can you try to repeat your calculations using:
Code: Select all
X_all_q_nCPU_LinAlg_INV= 1


Best,
Daniele
Dr. Daniele Varsano
S3-CNR Institute of Nanoscience and MaX Center, Italy
MaX - Materials design at the Exascale
http://www.nano.cnr.it
http://www.max-centre.eu/
User avatar
Daniele Varsano
 
Posts: 2027
Joined: Tue Mar 17, 2009 2:23 pm

Re: Segmentation fault when running hBN-2D-para example

Postby lance xu » Tue May 29, 2018 10:52 pm

Hi Daniele,

It works, the error message no longer shows up. And I got a time vs the number of MPI tasks plot as expected. So what is special about 8 MPI processes?

Weiqing
You do not have the required permissions to view the files attached to this post.
Weiqing Xu
Department of Physics,
Arizona State University, US
https://isearch.asu.edu/profile/2392919
lance xu
 
Posts: 4
Joined: Wed May 23, 2018 11:57 pm

Re: Segmentation fault when running hBN-2D-para example

Postby Daniele Varsano » Wed May 30, 2018 8:14 am

Dear Weiking,
I've reproduced your problem, we will inspect what is going wrong and fix it.
In the meanwhile, you can safely continue to use yambo without using linear algebra parallelization, as you can see you can observe a very good scaling even without using it.

Thanks for reporting,

Best,
Daniele
Dr. Daniele Varsano
S3-CNR Institute of Nanoscience and MaX Center, Italy
MaX - Materials design at the Exascale
http://www.nano.cnr.it
http://www.max-centre.eu/
User avatar
Daniele Varsano
 
Posts: 2027
Joined: Tue Mar 17, 2009 2:23 pm

Re: Segmentation fault when running hBN-2D-para example

Postby Davide Sangalli » Wed Jun 27, 2018 11:37 am

Dear Weiking,

Code: Select all
X_all_q_nCPU_LinAlg_INV

needs to be set to a value which is the square of an integer.
Thus 1, 4, 9, 16 etc.

Doing so it should work.
Best,
D.
Davide Sangalli, PhD
CNR-ISM, Division of Ultrafast Processes in Materials (FLASHit) and MaX Centre
http://www.ism.cnr.it/en/davide-sangalli-cv/
http://www.max-centre.eu/
User avatar
Davide Sangalli
 
Posts: 315
Joined: Tue May 29, 2012 4:49 pm
Location: Via Salaria Km 29.3, CP 10, 00016, Monterotondo Stazione, Italy


Return to Technical Issues

Who is online

Users browsing this forum: No registered users and 1 guest