Search code examples
mathmit-scratch

Incorrect result percentage simulating the birthday paradox


scratch code

I saw a video about the birthday problem where if you had a room with 23 people, there is a 50% chance that two people share a birthday. Rather than just understanding the math, I wanted to run a quick simulation, so I was very hasty and whipped up this code in scratch.

Just for some context, I've only been coding for around a year, having taken a class in high school and having some outside experience. Clearly, when I run the simulation, the true percentage comes to be 5% which is one order of magnitude than expected. What exactly is wrong with the code?

I wrote the code to only run 1000 simulations and I would just have the amount of times it matched to get a rough estimate, but I was only getting around 50-70 rather than 500. So, I rewrote the code to make it run indefinitely and just create a nice percentage.


Solution

  • There are a couple issues here. The first is the repeat until loop. All this does is loop down through the list of 23 unsorted birthdays and compares each birthday against the one immediately next to it. The logic you actually want compares each birthday to all other birthdays, not just the one next to it.

    Another issue is that the comparison is not between two birthdays, but between the counter variable i and a birthday. Instead of:

    if i = birthday[i - 1]
    

    you want something more like:

    if birthday[i] = birthday[i - 1]
    

    ...except using birthday[i] = birthday[j], where j is every other birthday in the list of birthdays.

    Here's heavily-commented JavaScript code you can try to adapt to fix your Scratch code. You can treat it like pseudocode, making it a bit more interesting than if I just gave you the answer in Scratch.

    // Declare a variable to keep track of how
    // many simulations returned a matching birthday.
    let matchingBirthdays = 0;
    
    // The total number of simulations, a constant.
    const simulations = 1000;
    
    // For each simulation, we will pick 23
    // random birthdays for the year.
    const numberOfBirthdaysToTest = 23;
    
    // Now loop `simulations` times and run each simulation.
    for (let i = 0; i < simulations; i++) {
      
      // Start the simulation by making an empty
      // array (list) of 23 birthdays.
      const birthdays = [];
      
      // For each of the 23 days...
      for (let j = 0; j < numberOfBirthdaysToTest; j++) {
    
        // ...pick a random day of the year
        // and add it to the `birthdays` list.
        birthdays.push(Math.floor(Math.random() * 365));
      } // end random birthday array creation block
      
      // Show the birthdays that were selected (if you want):
      // console.log(birthdays)
    
      // Now, let's find matching birthdays.
      // Let's assume we won't find any matches:
      let found = false;
    
      // Then loop over every birthday to try to find a match...
      for (let j = 0; !found && j < birthdays.length; j++) {
      
        // For birthdays[j], see if there's a birthdays[k]
        // later in the array that has the same value.
        for (let k = j + 1; k < birthdays.length; k++) {
    
          // Now we have two different birthdays
          // pointed to by indices j and k.
          // Determine if they are equal:
          if (birthdays[j] === birthdays[k]) {
          
            // We found a match! Add 1 to the
            // total number of birthdays:
            matchingBirthdays++;
            
            // Optionally log the hit:
            // console.log("Found match!");
            
            // Then set `found` to true and break out of
            // both loops to end the whole search.
            // We'll move on to the next simulation.
            found = true;
            break;
          } // end birthday match block
        } // end search block
      
        // If we got here and `found` is still false,
        // then no match was found, and
        // `matchingBirthdays` will not change.
      } // end loop over all birthdays
    } // end simulation block
    
    // Print the final ratio of simulations that have
    // at least a pair of matching birthdays over the
    // total number of simulations. This should be around
    // 0.5 assuming we've done at least 1000 simulations.
    console.log(matchingBirthdays / simulations);

    Note that this code intentionally avoids clever JavaScript functions that would simplify the code, but don't translate well to Scratch.

    This project is a great illustration of the complexity (growth rate) of nested loops. The reason it's called the birthday paradox is that it's surprising that only 23 people are needed to achieve a 50% probability of at least one pair sharing a birthday. Rather than 23 comparisons, there are 253 comparisons available to find a single match among.

    In plain English, for each person, there are 22 other people to compare against. In code terms, that's going to be a nested loop. Nested loops do a lot more work even if you only increase the number of elements by a small amount. Try increasing the numberOfBirthdaysToTest variable and you'll see it quickly approaches 100% probability of a match, even without getting anywhere near 365.

    This phenomenon is known as O(n^2) or a quadratic algorithm. Here's the growth rate of a quadratic algorithm:

    const nestedLoop = n => {
      let iterations = 0;
    
      for (let i = 0; i < n; i++) {
        for (let j = 0; j < n; j++) {
          iterations++;
        }
      }
    
      return iterations;
    };
    
    for (let n = 0; n < 15; n++) {
      const iterations = nestedLoop(n);
      console.log(`n = ${n}, iterations = ${iterations}`);
    }

    There's a chart on Wikipedia illustrating the growth rate as n (the number of people) increases. Hopefully this helps build intuition for determining why your solution is wrong: without seeing that nested loop in your code, you won't see that "explosive growth" effect that is characteristic of the birthday problem.