mt19937 and uniform_real_distribution

I am trying to find an efficient way to implement a uniform(0,1) distribution. Since I have to generate a very large number of samples, I chose mt19937 as engine. I am using the version from the boost library. My question is: what is the difference between using the output of the engine itself vs using uniform_real_distribution?

Option #1

std::random_device rd;
boost::mt19937 gen(rd());
boost::random::uniform_real_distribution<double> urand(0, 1);

for ( int i = 0; i < 1E8; i++ ) {
    u = urand(gen);
}

Option #2

std::random_device rd;
boost::mt19937 gen(rd());

for ( int i = 0; i < 1E8; i++ ) {
    u = (double) gen()/gen.max();
}

From my tests, Option #2 is considerably better than Option #1 in terms of runtime. Is there any reason I should pick Option #1 over Option #2?

Solution

I don't know the underlying implementation of urand(), but using the result of a division is likely to produce bias in the low-order bits as a quantisation effect. If gen.max() isn't large then "low-order bits" may be very many or most of the bits of the result.

The performance disparity may come from producing properly distributed random numbers. If double is overly precise for your needs then perhaps using float might allow it to run more efficiently.