How can one be sure that a function is really random or as close to the notion as possible? Also, what is the distinction between random and pseudo-random? Finally, what algorithms/sources can be used to generate random numbers?
P.S: Also asking this because a MySQL statement using ORDER BY RAND() LIMIT 1
isn't giving convincing results.
Aloha!
There are several methods and tools for testing for randomness. These are applied on a set of numbers collected from the generator to be tested. That is, you test the generator based on a set of data generated.
In computing, esp for IT-security we normally want to have a generator that conforms to a uniform random process. There are many different processes, but I'm guessing that it is a uniform process you are aiming for.
NIST has published several documents with recommendations on both pseudo random number generators as well how to test them. Look at NIST documents SP 800-22 and SP 800-20.
As somebody else pointed out. If you want a True Random Number Generator (TRNG) you need to gather physical entropy. Examples of such sources are radioactive decay, cosmic radiation, lava lamps etc. Preferably you want sources that are hard to manipulate. IETF has an RFC that have some good recommendations, see RFC 4086 - Source of Randomness for Security: https://www.rfc-editor.org/rfc/rfc4086
What you normally do is to collect entropy from one ore more (preferably more than one) source. The collected data is then filtered (whitening) and finally used to periodically seed a good PRNG. With different seeds, naturally.
This is how most modern good random generators works. An entropy collector feeding a PRNG created using cryptographic primitives such as symmetric ciphers (AES for example) or hash functions. See for example the random generator Yarrow/Fortuna by Schneier, which in modified form is used in FreeBSD.
Coming back to your question about testing. As somebody pointed out Marsaglia have produced a good set of tests, which was codified in the DIEHARD tests. There are now an even more exapnded set of tests in the Dieharder tests: http://www.phy.duke.edu/~rgb/General/dieharder.php
Dieharder is a good tool that will give you good confidence that the huge pile of numbers supplied to it (collected from your generator) is random (with good quality) or not. Running Dieharder is easy, but will take some time.
In situ testing of randomness is hard. You don't normally want to implement Dieharder in your system. What you can do is implement some simple detectors that should detect patholigical cases. I usually suggest:
Equal value length. A simple counter that is reset whenever two consequtive values generated by the RNG differs. And then you need to define a threshold when you think the counter shows that the RNG is broken. If you see 10 million equal values and the value space is greater that one value (the one you see) your RNG is probably not working all that well. Esp if the value are seeing is one of the edge values. For example 0x00000.... or 0xfffff...
Median value. If you after generating a million values and have a uniform distribution have a median value that is heavily leaning towards one of the value space edges, not close to the middle, someting is probably also amiss.
Variance. If you after generating million of values haven't seen values close to the MIN and MAX of the value space, but instead have a narrow generated value space, then something is also amiss.
Finally. Since you hopefully are using a good PRNG (based on AES for example), the in situ-tests suggested might instead be applied on the entropy source.
I hope that helped in some ways.