I have an application which uses a hash algorithm (currently MD5) to generate a unique ID in a database table. The hash is calculated based on some fields of a row, but nothing checks that calculation, for when those fields are changed later on, the ID of that row doesn't change.
Now I want to change the code to add some new features, while generating a purely random number for the ID could simplify my work a lot (it's a long story to tell why it is much easier for me to generate that ID before I'm able to get all the necessary fields' content for the hash algorithm)
I know that usually the programming language's own random generator generates pseudo random number, but I'm using Python's random.SystemRandom(), which uses operating system's cryptography level 'true' random generator, so I believe it should be the same collision probability comparing with generating the ID with hash algorithm.
Is my understanding correct? If not, why?
Generating X number of bytes of random data gives exactly the same collision probability as using the hash function on some ID's...
ASSUMING...
I would recommend using the system's cryptographic random number provider. Because you've probably made mistakes. Here's an easy one:
Your system: Concatenate column 1 and column 2, and hash the result. You can guarantee you'll never ever do this on those values of column 1 and column 2 ever again. NEVER.
What about when:
vs
Those would create the same hash function.
So who would you trust more to give you random data? Yourself? Or a team of operating system developers including cryptography experts and decades of research and experience? :)
Go with the system's cryptographic random function.