Search code examples
segmentation-faultgnu-parallel

GNU Parallel and SIGSEGV/SIGABRT


In an attempt to reduce the time for a simulation from 15 days to something much less I looked into GNU Parallel. It does the job, but throws some errors that I cannot find an explanation for.

The code is:

parallel "./create_ffile.py -r {2} -s {1}; GENENMM -pdb file.pdb -fcust ffile.txt; DIAGSTD; FREQEN; RMSCOL" :::: arg1.txt arg2.txt

where GENENMM, DIAGSTD, FREQEN and RMSCOL are fortran codes and the argfiles contain variables to create a ffile.txt that is fed into GENENMM.

The errors are:

Program received signal SIGSEGV: Segmentation fault - invalid memory peference. Backtrace for this error: #0 aaaaaaaaaaaa #1 ..... etc

and

Program received signal SIGABRT: Process abort signal. Backtrace for this error: #0 aaaaaaaaaaaa #1 ..... etc

Both errors are followed by either (core dumped) DIAGSTD or (core dumped) RMSCOL

What I cannot understand is why they only appear for some {1}-{2} combinations and not all. Furthermore, both errors sometimes appear together, sometimes only one of them appears. From what I read online, is that something happens with the fortran codes. But why does it then not happen for all files? Does it have something to do with the fact that they are all running in parallel?

Thanks for any help/comments in advance! Marie


Solution

  • It is not clear to me which files the different programs use. My guess is that the programs use the same files. So if multiple copies run at the same time, they will interfere with each other, but that this will not happen if they are run in serial.

    So the solution is to make each copy run on different files. The standard way to do this is to make a directory for each copy. Something like this:

    parallel "mkdir {#}; cd {#}; ../create_ffile.py -r {2} -s {1}; GENENMM -pdb ../file.pdb -fcust ffile.txt; DIAGSTD; FREQEN; RMSCOL" :::: arg1.txt arg2.txt