Search code examples
mathrandombirthday-paradox

What is the general formula for calculating the probability of generating a duplicate random number?


I am generating random order tracking numbers that should be short, yet should not be duplicate.

For a tracking number consisting of 3 digits, we will generate a duplicate random number after 40 attempts on average.

If we bump it to 12 digits, it will take on average 1.3 million attempts to generate a duplicate random number.

What is a more general formula for calculating how many attempts on average it will take to generate a duplicate random number up to a predefined limit?

Empirically, I can figure it out using this code, but I am looking for a more general solution:

/**
 * Calculates the average iteration at which we will generate
 * a random integer that has been generated before.
 * This numbers is much smaller than expected due to birthday paradox.
 */

// Generate random numbers up to (exclusive) this number:
const RANGE = 1000000000000;

let foundCollisions = 0;
const foundAt = [];

while(true) {
  let i = 0;
  const exists = {}

  while(true) {
    i++;

    const random = Math.floor(Math.random() * RANGE);

    if (exists[random]) {
      // We found a duplicate number:
      break;
    } else {
      exists[random] = true;
    }
  }

  foundCollisions++;
  foundAt.push(i);

  // Calculate the average iteration at which we found a duplicate number:
  const averageFoundAt = foundAt.reduce((total, num) => total + num) / foundCollisions;
  console.log(`Found ${foundCollisions} collisions, average at ${Math.round(averageFoundAt)}th iteration.`);
}


Solution

  • Average value of tryouts before duplicate is described at Wiki birthday paradox page and

    n(average) = 1 + Q(M)
    

    where M is your range and

    Q(M) = Sum[k=1..M](M!/(M-k)!M^k)
    

    Ramanujan approximation gives 40.3 for M=1000 and 1253314 for 10^12