Search code examples
randomcryptographyentropy

How to get truly random data, not random data fed into a PRNG seed like CSRNG's do?


From what I understand, a CSRNG like RNGCryptoServiceProvider still passes the truly random user data like mouse movement, etc through a PRNG to sort of sanitize the output and make it equal distribution. The bits need to be completely independent.

(this is for a theoretical infinite computing power attacker) If the CSRNG takes 1KB of true random data and expands it to 1MB, all the attacker has to do is generate every combination of 1KB of data, expand it, and see which 1MB of data generates a one-time pad that returns sensible english output. I read somewhere that if the one-time pad had a PRNG anywhere in the RNG, it is just a glorified stream cipher. I was wondering if the truly random starting data was in large enough numbers to just use instead of cryptographically expanding. I need truly random output for a one-time pad, not just a cryptographically secure RNG. Or perhaps if there were other ways to somehow get truly random data, so that all bits are independent of each other. I was thinking of XOR'ing with the mouse coordinates for a few seconds, then perhaps the last digits of the Environment.TickCount, then maybe getting microphone input (1, 2, 3, 4) as well. However, as some point out on stackoverflow, I should really just let the OS handle it all. Unfortunately that isn't possible since there is an PSRNG used. I would like to avoid a hardware solution, since this is meant to be an easy to use program, and also not utilize RDRAND since it ALSO uses a PRNG (unless RDRAND can return the truly random data before it goes through a PRNG??). Would appreciate any responses if such a thing is even possible; I've been working on this for weeks under the impression that RNGCryptoServiceProvider was sufficient for a one time pad. Thanks.

(Side note: some say for most crypto functions you don't need true entropy, just unpredictability. for a one-time pad, it MUST be random otherwise it is not a one time pad.)


Solution

  • As you know, "truly random" means each of the bits is independent of everything else as well as uniformly distributed. However, this ideal is hard, if not impossible, to achieve in practice. In general, the closest way to get "truly random data" in practice is to gather hard-to-guess bits from nondeterministic sources, then condense those bits into a random block of data.

    There are many issues involved with getting this close to "truly random data", including the following:

    1. The sources must be nondeterministic, that is, their output cannot be determined by their inputs. Examples of nondeterministic sources include timings of input devices; thermal noise; and the noise registered by microphone and camera outputs.
    2. The sources' output must be hard to guess. This is more formally known as entropy, such as 32 bits of entropy per 64 bits of output. However, measuring entropy is far from trivial. If you need 1 MB (8 million bits) of truly random data, you need to have data with at least 8 million bits of entropy (which in practice will be many times more than 1 MB long depending on the sources), then condense the data somehow into 1 MB of data while preserving that entropy.
    3. The sources must be independent of each other.
    4. There should be two or more independent sources. This is because it's impossible to extract full randomness from just one source (see McInnes and Pinkas 1990). On the other hand, extracting randomness from three or more independent sources is relatively trivial, but there is still a matter of choosing an appropriate randomness extractor, and a survey of randomness extractors would be beyond the scope of this answer.

    In general, for random number generation purposes, the more sources available, the better.

    REFERENCES:

    • McInnes, J. L., & Pinkas, B. (1990, August). On the impossibility of private key cryptography with weakly random keys. In Conference on the Theory and Application of Cryptography (pp. 421-435).