Question: Will this piece of code generate unique
(with very high probability) and deterministic
(always) UUIDs for different URLs?
import uuid
uuid1 = uuid.uuid5(uuid.NAMESPACE_URL, 'https://stackoverflow.com/questions/ask1')
uuid2 = uuid.uuid5(uuid.NAMESPACE_URL, 'https://stackoverflow.com/questions/ask2')
ie.
uuid1 != uuid2 if url1 != url2
(uniqueness with very high probability), andurl1 will always generate uuid1
(deterministic always)I was going to suggest running your namespace and string through a SHA hash function and then using the lower 128 bits for the UUID.
Turns out that's very similar what they do in Python for UUID5:
def uuid5(namespace, name):
"""Generate a UUID from the SHA-1 hash of a namespace UUID and a name."""
from hashlib import sha1
hash = sha1(namespace.bytes + bytes(name, "utf-8")).digest()
return UUID(bytes=hash[:16], version=5)
https://github.com/python/cpython/blob/3db42fc5aca320b0cac1895bc3cb731235ede794/Lib/uuid.py#L718
I think you would have to be pretty unlucky (like astronomically unlucky) for there to be a collision. And it's deterministic for sure.