I have run the same C++ problem size of different number of CPUs on an HPC cluster, but what I figured out is that when the number of CPUs increased the execution time also increased. I was expecting a significant decrease in execution time. Can anyone shed some light in this issue?
Below are my execution times per # of CPUs
Number of CPUs Problem size Time (seconds)
1 3000000 15.48
2 3000000 18.2
4 3000000 21.73
8 3000000 40.55
16 3000000 60.14
32 3000000 98.75
My thoughts:
Hope this explains it:
"There are two major factors that influence performance: the speed of the CPUs themselves, and the speed of their access to memory. In a cluster, it’s fairly obvious that a given CPU will have fastest access to the RAM within the same computer (node). Perhaps more surprisingly, similar issues are relevant on a typical multicore laptop, due to differences in the speed of main memory and the cache. Consequently, a good multiprocessing environment should allow control over the “ownership” of a chunk of memory by a particular CPU."