In parallel computing theoretically super-linear speedup is not possible. But in practice we do see such cases. One reason is cache effect but I fail to understand what does it play. Also, there are other things involved but what are they? In summary,
How are super-linear speedups possible?
I'm a beginner with respect to parallel computing.
Suppose you have an 8 processor machine, each processor has a 1MB cache, and your computation uses 6MB of data.
On 1 processor the computation will be doing a lot of data movement between CPU, cache and RAM. On 8 processors the computation will only have to move data between CPU and cache. This way you can achieve super-linear speedup.
These figures and this analysis have been simplified for exposition for a beginner.