Search code examples
pythonrandomsetprobabilityintersection

How to calculate a probability of numbers matching up between two sets of lists? (python)


I'm trying to make a lottery program with a coupon of 6 elements of my choice from 1 to 10. (I changed the numbers to be simple for the sake of this example). The following script does the job of generating random 6 elements of numbers in the range of 1 to 10 and it finds the intersection between them and between my coupon, but I would like to calculate the chance of(ratios), for example, 4 matchup numbers between the generated random numbers and between my coupon. any ideas?

import random
from collections import Counter
mc = [9, 6, 5, 4, 8, 1]
mycoupon = set(mc)
for _ in range(100):
    r = random.sample(range(1, 10), 6)
    draws = set(r)
    
    cc = Counter(mycoupon)
    dc = Counter(r)
    common = cc.keys() & dc.keys()
    counts = 0
    for cel in common:
        counts += min(cc[cel], dc[cel])

    print("My coupon: ", mycoupon)
    print("Draw: ", draws)
    print("Matches: ", counts)

Solution

  • This is really a probability question more than a programming question. But no worries, probability is cool.

    So, in your example, there are 6 elements on a ticket. Suppose the number of possible values on each element is N (N=10 in your example).

    I assume that for each element, all N values are equally likely. I also assume that each element's value is chosen independently from the others. This means that each of the N^6 possible tickets is equally likely.

    That means we can compute the likelihood of some condition occurring based on the number of tickets that would satisfy that condition. For example, the chance of a ticket matching all 6 numbers is 1 / N^6, because there's exactly one ticket that matches all 6.

    If you want to find the probability of matching exactly 4 numbers, we just have to count the number of tickets that match exactly 4 elements. We can count them by considering this process for generating tickets:

    • First, select which 4 elements match, setting them equal to the true values.
    • Then select the values of the remaining elements, setting them to any value but the matching value.

    This process can generate any ticket matching exactly 4 elements. It can't generate a ticket having more than 4 elements matching. So if we count up how many tickets this process might choose, that's the number of tickets matching exactly 4 elements.

    The first step has 6 choose 4 = 15 possible choices. The second step has (N-1)^2 possible outcomes (N-1 because we can't choose the correct value).

    Therefore, the number of tickets that match exactly 4 out of 6 is 15 * (N - 1)^2, for a probability of 15 * (N-1)^2 / N^6. In your example of N=10, that's 15 * 9^2 / 10^6 = 0.001215.

    In the general case of a ticket of length T, in which you wish to match exactly k elements, each element being up to N, the probability should be (T choose k) * (N-1)^(T-k) / N^T.

    Probably the most convenient way to write that in Python will be to actually compute as written, counting the number of matching tickets and then dividing by N^T. This is because Python (at least, Python 3, don't know about Python 2) will automatically switch to using bignums to handle large integers, so you won't have to worry about loss of precision or overflow.