openmp multicore overhead hyperthreading

Is CPU time relevant to Hyperthreading?

Is increased CPU time (as reported by time CLI command) indicative of inefficiency when hyperthreading is used (e.g. time spent in spinlocks or cache misses) or is it possible that the CPU time is inflated by the odd nature of HT? (e.g. real cores being busy and HT can't kick in)

I have quad-core i7, and I'm testing trivially-parallelizable part (image to palette remapping) of an OpenMP program — with no locks, no critical sections. All threads access a bit of read-only shared memory (look-up table), but write only to their own memory.

 cores real CPU
  1:   5.8  5.8
  2:   3.7  5.9
  3:   3.1  6.1
  4:   2.9  6.8
  5:   2.8  7.6
  6:   2.7  8.2
  7:   2.6  9.0
  8:   2.5  9.7

I'm concerned that amount of CPU time used increases rapidly as number of cores exceeds 1 or 2.

I imagine that in an ideal scenario CPU time wouldn't increase much (same amount of work just gets distributed over multiple cores).

Does this mean there's 40% of overhead spent on parallelizing the program?

Solution

It's quite possibly an artefact of how CPU time is measured. A trivial example, if you run a 100 MHz CPU and a 3 GHz CPU for one second each, each will report that it ran for one second. The second CPU might do 30 times more work, but it takes one second.

With hyperthreading, a reasonable (not quite accurate) model would be that one core can run either one task at lets say 2000 MHz, or two tasks at lets say 1200 MHz. Running two tasks it does only 60% of the work per thread, but 120% of the work for both threads together, a 20% improvement. But if the OS asks how many seconds of CPU time was used, the first will report "1 second" after each second on real time, while the second will report "2 seconds".

So the reported CPU time goes up. If it less than doubles, overall performance is improved.