I am building microservice based architecture where multiple microservices are running in parallel for horizontal scalability. All services are using same algorithm to generate UUID (UUID.randomUUID), once UUID is generated it is saved in database and returned to calling service. After few seconds caller sends request to verify status of txn with UUID.
In relational DB UUID is primary key, We have seen collision of UUID generated by different services. Questions
- What is possibility of duplicate UUID across JVMs.
It is possible, but the probability is vanishingly small. The Wikipedia page on the Birthday Problem has a probability table that can be used to estimate the likelihood of a collision.
For example, with 128 bit random UUIDs (and a high quality random number generator) the table says that you would need to generate 2.6 x 1010 UUIDs for the probability of a collision to reach 1 in 1018.
Earlier in the article you will find the mathematics on calculating ... and estimating ... the probabilities.
- Should we add some logic in code to verify collision before saving it to DB?
It really depends on the number of UUIDs you are likely to generate and store, and on the probability of collision that you are willing to accept.
However, if you are concerned by the possibility of a collision, you could just make the UUID columns a unique keys in the relevant database tables. It is more likely that a transaction will fail due to a hardware error than you will get a collision leading to a uniqueness constraint failure!
Followup questions:
I am not sure if this probability is for one generator or multiple?
The number of generators is not relevant, provided that they are >independent< random number generators.
As we have seen collision few hundred times with 1 million txns.
The mathematics don't lie. If you have seen a collision a few hundred times with 1 million transactions then something else is wrong. The assumptions are incorrect.
For example:
There are a lot of things that you need to check before you start doubting the mathematics.
My doubt is all 4 services are using same algorithm the probability will increase.
As I said, the number of generators does not alter the mathematics.