c++architecture operating-system cluster-computing hpc

Increased Execution TIme even with increased number of CPUs, Why?

I have run the same C++ problem size of different number of CPUs on an HPC cluster, but what I figured out is that when the number of CPUs increased the execution time also increased. I was expecting a significant decrease in execution time. Can anyone shed some light in this issue?

Below are my execution times per # of CPUs

  Number of CPUs      Problem size         Time (seconds)
  1                   3000000              15.48
  2                   3000000              18.2
  4                   3000000              21.73
  8                   3000000              40.55
  16                  3000000              60.14
  32                  3000000              98.75

My thoughts:

Too much communications increased between the CPUs that leads to increased the execution time.

Solution

Hope this explains it:

"There are two major factors that influence performance: the speed of the CPUs themselves, and the speed of their access to memory. In a cluster, it’s fairly obvious that a given CPU will have fastest access to the RAM within the same computer (node). Perhaps more surprisingly, similar issues are relevant on a typical multicore laptop, due to differences in the speed of main memory and the cache. Consequently, a good multiprocessing environment should allow control over the “ownership” of a chunk of memory by a particular CPU."