I have a serial(nonparallel) application written in C. I have modified and re-written it using Intel Threading Building Blocks. When I run this parallel version on an AMD Phenom II machine which is a quad-core machine, I get a performance gain of more than 4X which conflicts with the Amdahl's law. Can anyone give me a reason why this is happening?
Thanks, Rakesh.
If you rewrite the program, you could make it more efficient. Amdahl's law only limits the amount of speedup due to parallelism, not how much faster you can make your code by improving it.
You could be realizing the effects of having 4x the cache, since now you can use all four procs. Or maybe having less contention with other processes running on your machine. Or you accidentally fixed a mispredicted branch.
TL/DR: it happens.