Search code examples
mathprobability

How to add multiple independent probabilities to determine the overall probability of a single output


I apologize in advance for any confusing explanations, but I will try to be as clear as possible.

If there are multiple indicators that predict an outcome with a known accuracy, and they are all attempting to predict the same result, how do you properly add the probabilities?

For example, if John and David are taking a test, and historically John answers 80% of questions correctly, and David answers 75% of questions correctly, and both John and David select the same answer on a question, what is the probability that they are correct? Let's assume that John and David are completely independent of each other and that all questions are equally difficult.

I would think that the probability that they are correct is higher than 80%, so I don't think averaging makes sense.


Solution

  • Thank you to Robert who commented on this question, I was able to figure out that what I was looking for is a well-known problem solved by Bayes Theorem, which is used to re-evaluate existing probabilities given new information. I won't go further into the intuition behind it but 3Blue1Brown has a very good video on the topic.

    Bayes Theorem states: P(A|B) = (P(A)*P(B)) / (P(A)*P(B) + P(!A)*P(!B))

    Where: P(A) is probability 1, P(!A) is 1 - P(A), P(B) is probability 2, and P(!B) is 1 - P(B)

    Using this equation in the scenario in the question, if John has an 80% chance of being right and David has a 75% chance of being right, and both agree, then the chance that they are both correct is 92.3%.

    To prove this, I wrote a simple python script that simulates this exact scenario n times and prints out the result. In this code, two "experts" have a set probability of being true or false, and their accuracy is tracked individually and together.

    import random
    
    TRIALS = 1000000
    
    exp1_correct = 0
    exp2_correct = 0
    combined_correct = 0
    consensus_count = 0
    
    for i in range(TRIALS):
        expert1 = random.random() <= 0.8
        expert2 = random.random() <= 0.75
    
        if expert1 and expert2:
            combined_correct += 1
        if expert1:
            exp1_correct += 1
        if expert2:
            exp2_correct += 1
        if expert1 == expert2:
            consensus_count += 1
    
    print(f'Expert 1 had an accuracy of {exp1_correct / TRIALS}')
    print(f'Expert 2 had an accuracy of {exp2_correct / TRIALS}')
    print(f'Consensus had an accuracy of {combined_correct / consensus_count}')
    

    Running this verifies that the equation above is correct. Hopefully this is helpful to someone that has the same question that I did!