performance optimization parallel-processing gnu-parallel

Why does GNU parallel affect script speed?

I have some Fortran script. I compile with gfortran and then run as time ./a.out.

My script completes, and outputs the runtime as,

real 0m36.037s
user 0m36.028s
sys 0m0.004s

i.e. ~36 seconds

Now suppose I want to run this script multiple times, in parallel. For this I am using GNU Parallel.

Using the lscpu command tells me that I have 8 CPUs, with 2 threads per core and 4 cores per socket.

I create some file example.txt of the form,

time ./a.out
time ./a.out
time ./a.out
time ./a.out
...

which goes on for 8 lines.

I can then run these in parallel on 8 cores as,

parallel -j 8 :::: example.txt

In this case I would expect the runtime for each script to still be 36 seconds, and the total runtime to be ~36 seconds. However, in actuality what happens is the run time for each script roughly doubles.

If I instead run on 4 cores instead of 8 (-j 4) the problem disappears, and each script reverts to taking 36 seconds to run.

What is the cause of this? I have heard talk in the past on 'overheads' but I am not sure exactly what is meant by this.

Solution

What is happening is that you have only one socket with 4 physical cores in it. Those are the real cores of your machine. The total number of CPUs you see as output of lscpu is calculated using the following formula: #sockets * #cores_per_socket * #threads_per_core. In your case it is 1*4*2=8.

Threads per core are a sort of virtual CPUs and they do not always perform as real CPUs, expecially for compute intense processing (this spec is called hyperthreading ). Hence when you try to squeeze two threads per core, they get almost executed serially.

Take a look at this article for more info.