Search code examples
uuid

Generating v5 UUID. What is name and namespace?


I've read the man page, but I do not understand what name and namespace are for.

For version 3 and version 5 UUIDs the additional command line arguments namespace and name have to be given. The namespace is either a UUID in string representation or an identifier for internally pre-defined namespace UUIDs (currently known are "ns:DNS", "ns:URL", "ns:OID", and "ns:X500"). The name is a string of arbitrary length.

The namespace:

The namespace is either a UUID in string representation or an

Does it mean that I need to store it (UUID v4) somewhere in relation to the generated UUID v5? In either case, why is this not done automatically?

The name is a string of arbitrary length.

name a completely random string? What is the purpose of it then? Can it be decoded from the UUID v5?


Solution

  • Name and namespace can be used to create a hierarchy of (very probably) unique UUIDs.

    Roughly speaking, a type 3 or type 5 UUID is generated by hashing together a namespace identifier with a name. Type 3 UUIDs use MD5 and type 5 UUIDs use SHA1. Only 128-bits are available and 5 bits are used to specify the type, so all of the hash bits don't make it into the UUID. (Also MD5 is considered cryptographically broken, and SHA1 is on its last legs, so don't use this to verify data that needs to be "very secure"). That said, it gives you a way of creating a repeatable/verifiable "hash" function mapping a possibly hierarchical name onto a probabilistically unique 128-bit value, potentially acting like a hierarchical hash or MAC.

    Suppose you have a (key,value) store, but it only supports one namespace. You can generate a large number of distinct logical namespaces using type 3 or type 5 UUIDs. First, create a root UUID for each namespace. This could be a type 1 (host+timestamp) or type 4 (random) UUID so long as you stash it somewhere. Alternatively you could create one random UUID for your root (or use the null UUID: 00000000-0000-0000-0000-000000000000 as root) and then create a reproducible UUID for each namespace using "uuid -v5 $ROOTUUID $NAMESPACENAME". Now you can create unique UUIDs for keys within a namespace using "uuid -v5 $NAMESPACEUUID $KEY". These UUIDs can be thrown into a single key-value store with high probability of avoiding collision. This process can be repeated recursively so that if for instance the "value" associated with a UUID key in turn represents some sort of logical "namespace" like a bucket, container or directory, then its UUID can be used in turn to generate more hierarchical UUIDs.

    The generated type 3 or type 5 UUID holds a (partial) hash of the namespace id and name-within-namespace (key). It no more holds the namespace UUID than does a message MAC hold the contents of the message it is encoded from. The name is an "arbitrary" (octet) string from the perspective of the uuid algorithm. Its meaning however depends on your application. It could be a filename within a logical directory, object-id within an object-store, etcetera.

    While this works well for a moderately large number of namespaces and keys, it eventually runs out of steam if you are aiming for a very large numbers of keys that are unique with very high probability. The Wikipedia entry for the Birthday Problem (aka Birthday Paradox) includes a table that gives the probabilities of at least one collision for various numbers of keys and table sizes. For 128-bits, hashing 26 billion keys this way has a probability of collision of p=10^-18 (negligible), but 26 trillion keys, increases the probability of at least one collision to p=10^-12 (one in a trillion), and hashing 26*10^15 keys, increases the probability of at least one collision to p=10^-6 (one in a million). Adjusting for 5 bits that encode the UUID type, it will run out somewhat faster, so a trillion keys have roughly a 1-in-a-trillion chance of having a single collision.

    See http://en.wikipedia.org/wiki/Birthday_problem#Probability_table for the probability table.

    See http://www.ietf.org/rfc/rfc4122.txt for more details on UUID encodings.