Search code examples
randomhexbit-manipulationuuidentropy

Differences between "UUIDv4 generation libaries" -vs- "just rolling your own with random hex characters or bits"


On the topic of the makeup of a UUID4, according to Wikipedia...

Version 4 UUIDs have the form xxxxxxxx-xxxx-4xxx-yxxx-xxxxxxxxxxxx where x is any hexadecimal digit and y is one of 8, 9, A, or B

So there are three possible methods I can think of that a programmer can use to generate a random UUID4:

MethodA) Use a "proper" UUID4 generation library that already exists.

-or-

MethodB) Roll-your-own simply by using a random hex chararacters on string:

  • Start with a string "xxxxxxxx-xxxx-4xxx-yxxx-xxxxxxxxxxxx"
  • Replace the "x" characters with random hex characters (0-9a-f)
  • Replace the "y" with any one of: 8 9 a b

The above steps are just one simple example of how this could be done as a character string. Please consider any other method operating on a character string and randomly chosen hex characters to still be "MethodB", for example starting with an empty string and appending characters one at a time.

-or-

MethodC: Roll-your-own with bitwise operations:

I guess this is how most libraries are doing it? Using mostly random bits while ensuring that the "4" and "8/9/a/b" are in the final generated string.

Questions:

Q1: Are there technically any differences in the resulting UUIDv4 that would be generated in terms of their randomness or general compatibility with databases etc that will store the UUIDv4?

Q2: Are there any downsides to using method #2 (random hex characters) over #1 or #3 (bitwise)?

Q3: Are the "proper" UUIDv4 generation libraries in MethodA doing anything special on top of how the simple approaches in MethodB and MethodC would do it?

Q4: Is any method more like to run into conflicts?

Q5: Are the resulting UUIDs generated by MethodB + MethodC fully compliant with the UUIDv4 specification (even if they are not compliant in their methodology to get there).

Notes:

  • This question only pertains to UUID version 4.
  • Obviously its easier to just use a library, I'm just asking about differences in the resulting generated UUIDv4 outcome rather than the amount effort the programmer puts in.
  • I'm also not really concerned with performance with the above questions. But if you also have any comments on this, that might be interesting too. I'd assume the libraries are better performance wise.

Solution

  • Q1: Are there technically any differences in the resulting UUIDv4 that would be generated in terms of their randomness or general compatibility with databases etc that will store the UUIDv4?

    Technically, no difference.

    Q2: Are there any downsides to using method #2 (random hex characters) over #1 or #3 (bitwise)?

    No downsides; not really.

    Q3: Are the "proper" UUIDv4 generation libraries in MethodA doing anything special on top of how the simple approaches in MethodB and MethodC would do it?

    Libraries generally do exactly what MethodC implies

    Q4: Is any method more like to run into conflicts?

    Not really.

    Q5: Are the resulting UUIDs generated by MethodB + MethodC fully compliant with the UUIDv4 specification (even if they are not compliant in their methodology to get there).

    Yes.

    Successful generation of (RFC 4122 variant) version 4 UUIDs is not complex; but requires some understanding of random number generation. For example, the difference between "pseudo-random" number generation vs. "crypto" quality random number generation.

    For example, a very simple "pseudo-random" number generator will often produce the same exact series of "random" numbers; which is often sufficiently annoying that a "seed" can be introduced to change the random number sequence.

    Of course, it is also annoying to generate the same UUIDs each time a UUID generator is invoked. Hence, a "pseudo-random" number generator is not ideal for generating UUIDs.

    "Crypto" quality random numbers are, perhaps, much more random, and are used by most UUID version 4 generators.

    In short, the best UUID version 4 generators are those based on the best random number generators. Section 4.4 of RFC 4122 give suggestions on how to achieve a higher degree of UUID version 4 randomness.

    There is a COMB UUID that was derived from the RFC 4122 variant that might be interesting to you.

    -- BONUS: You might want to check out Mahonri Moriancumer's UUID and GUID Generator and Forensics page. It uses a crypto quality random number generator to generate version 4 UUIDs.