Search code examples
pythondatetimefor-loopnested-loopsbirthday-paradox

How to tackle the Birthday Paradox Problem in Python?


I'm practicing the Birthday Paradox problem in Python. I've run it a bunch of times, with changing the random number of birthdays and **loop run number **, but the probability is either 0 or 100%, and I was unable to get other probability like 50% etc. Can someone help me look through my code and see what I did wrong? Thank you so much!!

from random import randint
from datetime import datetime, timedelta


first_day_of_year = datetime(2017, 1, 1)
num_of_ppl = 45
birthdays = []

# get 45 random birthdays list
for i in range(num_of_ppl):
    new_birthday = first_day_of_year + timedelta(days = randint(0, 365))
    birthdays.append(new_birthday)


# find if there's matched birthdays, run 10000 times
dups = 0 
duplicates = set()
for i in range(10000):
    for bday in birthdays:
        if birthdays.count(bday) > 1:
            duplicates.add(bday)
    if len(duplicates) >= 1:
        dups += 1


# calculate the probability
probability = dups/10000 * 100
print(probability)

Solution

  • If you generate the birthdays list each time, the probability is as expected. Also I didn't see a need to use datetime or set objects, I just replaced them with ints and bools without changing anything functionally. Also, you can use list comprehension in order to generate the birthdays list in one line:

    from random import randint
    
    num_iterations = 10000
    num_people = 45
    num_duplicates_overall = 0
    
    # generate a random birthday for each person, check if there was a duplicate,
    # and repeat num_iterations times
    for i in range(num_iterations):
        # start with a new, empty list every time.
        # get a list of random birthdays, of length num_people.
        birthdays = [randint(0, 365) for _ in range(num_people)]
        # Keep track of whether or not there was a duplicate for this iteration
        was_duplicate = False
        for bday in birthdays:
            if birthdays.count(bday) > 1:
                # We found a duplicate for this iteration, so we can stop checking
                was_duplicate = True
                break
        if was_duplicate:
            num_duplicates_overall += 1
    
    probability = num_duplicates_overall / num_iterations
    print(f"Probability: {probability * 100}%")
    

    Output with num_iterations = 1000000 and num_people = 23:

    Probability: 50.6452%
    

    Edit: Alternatively, there's this method to check for duplicates which is supposedly faster (but mainly I like it because it's on one line):

    if len(birthdays) != len(set(birthdays)):
        num_duplicates_overall += 1
    

    So, your code could look as simple as this:

    from random import randint
    
    num_iterations = 10000
    num_people = 45
    num_duplicates_overall = 0
    
    for i in range(num_iterations):
        birthdays = [randint(0, 365) for _ in range(num_people)]
        if len(birthdays) != len(set(birthdays)):
            num_duplicates_overall += 1
    
    probability = num_duplicates_overall / num_iterations
    print(f"Probability: {probability * 100}%")