Where does super-linear speedup come from?

In parallel computing theoretically super-linear speedup is not possible. But in practice we do see such cases. One reason is cache effect but I fail to understand what does it play. Also, there are other things involved but what are they? In summary,

How are super-linear speedups possible?

I'm a beginner with respect to parallel computing.

Solution

Suppose you have an 8 processor machine, each processor has a 1MB cache, and your computation uses 6MB of data.

On 1 processor the computation will be doing a lot of data movement between CPU, cache and RAM. On 8 processors the computation will only have to move data between CPU and cache. This way you can achieve super-linear speedup.

These figures and this analysis have been simplified for exposition for a beginner.

Function to return date of Easter for the given year
Best practices for avoiding hardcoded values IRL
Implementing xunit in a new programming language
How do I check if a number is a palindrome?
What does a double colon followed by an equals sign (::=) mean in programming documentation?
Can hash tables really be O(1)?
Number of ways to tile a W x H Grid with 2 x 1 and 1 x 2 dominos?
Way to go from recursion to iteration
Is floating-point math broken?
What's the difference between an argument and a parameter?
Algorithm to calculate number of intersecting discs
Is the Haversine Formula or the Vincenty's Formula better for calculating distance?
Conventions for exceptions or error codes
Should unsigned ints be used if not necessary?
Are there any CRobots style games that support robots written in more than one language?
Have you ever crashed the compiler?
Checking for string contents? string Length Vs Empty String
What does it mean for a data structure to be "intrusive"?
Algorithm to locate local maxima
Print two-dimensional array in spiral order
Find the length of the longest valid parenthesis sequence in a string, in O(n) time
How can I pair socks from a pile efficiently?
How to find a transformation matrix given the measurements from two coordinate systems?
Peak signal detection in realtime timeseries data
Fastest way to find the largest power of 10 smaller than x
How to do CamelCase with German words (or with any other language that supports compound nouns)?
How do I test emails that have attachments in different clients?
What algorithm to use to determine minimum number of actions required to get the system to "Zero" state?
Clean and type-safe state machine implementation in a statically typed language?
What would you see if left-right images of a 3d view are inverted?