Search code examples
performancealgorithmparallel-processingtheory

For parallel algorithm with N threads, can performance gain be more than N?


A theoretical question, maybe it is obvious:

Is it possible that an algorithm, after being implemented in a parallel way with N threads, will be executed more than N times faster than the original, single-threaded algorithm? In other words, can the gain be better that linear with number of threads?


Solution

  • It's not common, but it most assuredly is possible.

    Consider, for example, building a software pipeline where each step in the pipeline does a fairly small amount of calculation, but requires enough static data to approximately fill the entire data cache -- but each step uses different static data.

    In a case like this, serial calculation on a single processor will normally be limited primarily by the bandwidth to main memory. Assuming you have (at least) as many processors/cores (each with its own data cache) as pipeline steps, you can load each data cache once, and process one packet of data after another, retaining the same static data for all of them. Now your calculation can proceed at the processor's speed instead of being limited by the bandwidth to main memory, so the speed improvement could easily be 10 times greater than the number of threads.

    Theoretically, you could accomplish the same with a single processor that just had a really huge cache. From a practical viewpoint, however, the selection of processors and cache sizes is fairly limited, so if you want to use more cache you need to use more processors -- and the way most systems provide to accomplish this is with multiple threads.