algorithm random hash computer-vision perlin-noise

Why does Perlin noise use a hash function rather than computing random values?

I'm reading through this explanation of Perlin noise which describes a hash function that is calculates random points for all x, y coordinates.

If the x, y coordinate hashes are generated randomly which are eventually used for computing the gradient's and such, why couldn't I just generate random numbers on the fly?

Is it simply a question of optimization that we use a permutation on hash maps to find our random values? The only reason I could think of is that permutations through our hash map some how generates a smoothening effect but I fail to see how.

Just for clarification, I'm refering to this section in the code:

private static readonly int[] p = { 151,160,137,91,90,15,                 // Hash lookup table as defined by Ken Perlin.  This is a randomly
    131,13,201,95,96,53,194,233,7,225,140,36,103,30,69,142,8,99,37,240,21,10,23,    // arranged array of all numbers from 0-255 inclusive.
    190, 6,148,247,120,234,75,0,26,197,62,94,252,219,203,117,35,11,32,57,177,33,
    88,237,149,56,87,174,20,125,136,171,168, 68,175,74,165,71,134,139,48,27,166,
    77,146,158,231,83,111,229,122,60,211,133,230,220,105,92,41,55,46,245,40,244,
    102,143,54, 65,25,63,161, 1,216,80,73,209,76,132,187,208, 89,18,169,200,196,
    135,130,116,188,159,86,164,100,109,198,173,186, 3,64,52,217,226,250,124,123,
    5,202,38,147,118,126,255,82,85,212,207,206,59,227,47,16,58,17,182,189,28,42,
    223,183,170,213,119,248,152, 2,44,154,163, 70,221,153,101,155,167, 43,172,9,
    129,22,39,253, 19,98,108,110,79,113,224,232,178,185, 112,104,218,246,97,228,
    251,34,242,193,238,210,144,12,191,179,162,241, 81,51,145,235,249,14,239,107,
    49,192,214, 31,181,199,106,157,184, 84,204,176,115,121,50,45,127, 4,150,254,
    138,236,205,93,222,114,67,29,24,72,243,141,128,195,78,66,215,61,156,180
};

int aaa, aba, aab, abb, baa, bba, bab, bbb;
        aaa = p[p[p[    xi ]+    yi ]+    zi ];
        aba = p[p[p[    xi ]+inc(yi)]+    zi ];
        aab = p[p[p[    xi ]+    yi ]+inc(zi)];
        abb = p[p[p[    xi ]+inc(yi)]+inc(zi)];
        baa = p[p[p[inc(xi)]+    yi ]+    zi ];
        bba = p[p[p[inc(xi)]+inc(yi)]+    zi ];
        bab = p[p[p[inc(xi)]+    yi ]+inc(zi)];
        bbb = p[p[p[inc(xi)]+inc(yi)]+inc(zi)];

Why don't we just initialize the values as follows?

aaa = random(255)
aab = random(255)
// ...

Solution

The key idea behind Perlin noise generation is to create a grid of points, each of which is assigned some vector value, and then to interpolate between those points in a specific way.

I checked out Ken Perlin's original paper on Perlin noise and it seems like as far back as the original paper he recommends using a hash function to do this:

Associate with each point in the integer lattice a pseudorandom value and x, y, and z gradient values. More precisely, map each ordered sequence of three integers into an uncorrelated ordered sequence of four real numbers [a,b,c,d] = H([x,y,z]), where [a,b,e,d] define a linear equation with gradient [a,b,c] and value d at [x,y,z]. H is best implemented as a hash function.

(Emphasis mine).

I suspect that the reason for this has to do with memory concerns. Perlin noise generation requires that the gradient function at different points in space be reevaluated multiple times over the course of the run of the algorithm. Accordingly, you could either

have some formula that, given a point in space, evaluates to the gradient, or
explicitly create a table and store all of the random values that you need.

Option (1) is what Ken Perlin is proposing. The advantage of this approach is that the memory usage required to store the gradients is minimal; you just need to use a hash function.

Option (2) is what you're proposing. This works just fine, but it uses a ton of memory (you need multiple values stored for each point in the integer lattice you're working with). Remember that Perlin's paper was written back in 1985 (!) when memory was much, much scarcer than it is today.

My suspicion is that you can get away with either approach, but given that you don't need true randomness, the pseudorandomness afforded by a good hash function should be sufficient.

I can't explain why the author of that article you read chose to use the particular hash function that they did, though. My guess is that it's "random enough" and sufficiently fast that it doesn't end up being the bottleneck in the computation; remember that the hash function gets called a lot of times in the noise generation code. This seems to be the standard approach to implementing Perlin noise; even Ken Perlin mentions using this hash function on his site.

What you can't do is the approach you're proposing of just letting the variables aaa, aab, aba, etc. be random. The reason why is that the Perlin noise algorithm requires you to reevaluate the noise term at a given point multiple times and expects that it will give back the same values every time. If you wanted to compute truly random values, you could do so, but you'd need to cache your results so that you give back consistent answers of the noise terms at each point.