Search code examples
erlangmulticoreparallel-processingperformance

Reasons of sub-linear speedup in parallel programs


What are the reasons a parallelized program doesn't achieve the ideal speedup?

For example, I have thought about data dependencies, the cost of data transfer between threads (or actors), synchronisation for access to the same data structures, any other ideas (or subcategories of the reasons i mentioned)?

I'm particularly interested for problems occurring in the erlang actor model but any other issues are welcomed.


Solution

  • A few in no particular order:

    1. Cache line sharing - multiple variables on the same cache-line can incur overhead between processors, even if the theoretical model says they should be independent.
    2. Context switch overhead - if you have more threads than cores, there will be overhead in context switching.
    3. Kernel scalability issues: kernels may be fine at say 4 cores, but less efficient at 8.
    4. Lock conveying
    5. Amdahl's law - The limit of the parallel speed up of a program is the proportion of the program that can parallelized.