Search code examples
pythonpython-3.xuuid

Will uuid.NAMESPACE_URL generate unique deterministic UUIDs for different URLs?


Question: Will this piece of code generate unique (with very high probability) and deterministic (always) UUIDs for different URLs?

import uuid
uuid1 = uuid.uuid5(uuid.NAMESPACE_URL, 'https://stackoverflow.com/questions/ask1')
uuid2 = uuid.uuid5(uuid.NAMESPACE_URL, 'https://stackoverflow.com/questions/ask2')

ie.

  • uuid1 != uuid2 if url1 != url2 (uniqueness with very high probability), and
  • url1 will always generate uuid1 (deterministic always)

Solution

  • I was going to suggest running your namespace and string through a SHA hash function and then using the lower 128 bits for the UUID.

    Turns out that's very similar what they do in Python for UUID5:

    def uuid5(namespace, name):
         """Generate a UUID from the SHA-1 hash of a namespace UUID and a name."""
         from hashlib import sha1
         hash = sha1(namespace.bytes + bytes(name, "utf-8")).digest()
         return UUID(bytes=hash[:16], version=5)
    

    https://github.com/python/cpython/blob/3db42fc5aca320b0cac1895bc3cb731235ede794/Lib/uuid.py#L718

    I think you would have to be pretty unlucky (like astronomically unlucky) for there to be a collision. And it's deterministic for sure.