Search code examples
c++performancec++11random

Why is random device creation expensive?


C++11 supports pseudo-random number generation through the <random> library.

I've seen multiple books which mention that it is quite expensive to keep constructing and destroying std::random_device, std::uniform_int_distribution<>, std::uniform_real_distribution<> objects, and they recommend to keep a single copy of these objects in the application.

Why is it expensive to create/destroy these objects? What exactly is meant by expensive here? Is it expensive in terms of execution speed, or executable size, or something else?

Can someone please provide some explanation?


Solution

  • The short answer is: It depends on the system and library implementation

    Types of standard libraries

    • In low-quality implementations, such as MinGW before 9.2, random_device is nothing but a superfluous wrapper around std::rand(). I'm not aware of such an implementation still being current but someone may correct me on this

    • On a bare-metal system, e.g. an embedded microcontroller, random_device may interact directly with a hardware random number generator or a corresponding CPU feature. That may or may not require an expensive setup, e.g. to configure the hardware, open communication channels, or discard the first N samples

    • Most likely you are on a hosted platform, meaning a modern operating system with a hardware abstraction level. Let's consider this case for the rest of this post

    Types of random_device

    Your system may have a real hardware random number generator, for example the TPM module can act as one. See How does the Trusted Platform Module generate its true random numbers? Any access to this hardware has to go through the operating system, for example on Windows this would likely be a Cryptographic Service Provider (CSP).

    Or your CPU may have some built in, such as Intel's rdrand and rdseed instruction. In this case a random_device that maps directly to these just has to discover that they are available and check that they are operational. rdrand for example can detect hardware failure at which point the implementation may provide a fallback. See What are the exhaustion characteristics of RDRAND on Ivy Bridge?

    However, since these features may not be available, operating systems generally provide an entropy pool to generate random numbers. If these hardware features are available, your OS may use them to feed this pool or provide a fallback once the pool is exhausted. Your standard library will most likely just access this pool through an OS-specific API.

    That is what random_device is on all mainstream library implementations at the moment: an access point to the OS facilities for random number generation. So what is the setup overhead of these?

    System APIs

    • A traditional POSIX (UNIX) operating system provides random numbers through the pseudo-devices /dev/random and /dev/urandom. So the setup cost is the same as opening and closing this file. I assume this is what your book refers to

    • Since this API has some downsides, new APIs have popped up, such as Linux's getrandom. This one would not have any setup cost but it may fail if the kernel does not support it, at which point a good library may try /dev/urandom again

    • Windows libraries likely go through its crypto API. So either the old CSP API CryptGenRandom or the new BCryptGenRandom. Both require a handle to a service or algorithm provider. So this may be similar to the /dev/urandom approach

    Consequences

    In all these cases you will need at least one system call to access the RNG and these are significantly slower than normal function calls. See System calls overhead And even the rdrand instruction is around 150 cycles per instruction. See What is the latency and throughput of the RDRAND instruction on Ivy Bridge? Or worse, see RDRAND and RDSEED intrinsics on various compilers?

    A library (or user) may be tempted to reduce the number of system calls by buffering a larger number random bytes, e.g. with buffered file I/O. This again would make opening and closing the random_device unwise, assuming this discards the buffer.

    Additionally, the OS entropy pool has a limited size and can be exhausted, potentially causing the whole system to suffer (either by working with sub-par random numbers or by blocking until entropy is available again). This and the slow performance mean that you should not usually feed the random_device directly into a uniform_int_distribution or something like this. Instead use it to initialize a pseudo random number generator.

    Of course this has exceptions. If your program needs just 64 random bytes over the course of its entire runtime, it would be foolish to draw the 2.5 kiB random state to initialize a mersenne twister, for example. Or if you need the best possible entropy, e.g. to generate cryptographic keys, then by all means, use it (or better yet, use a library for this; never reinvent crypto!)