Search code examples
pythonsimulationmontecarlopoker

Calculate probability of a flush in poker


I have the code to keep going through a loop until a flush is made. now I am trying to make it where I use count to show how many hands are dealt then divide by one to get the probability.

For the code i have right now using count it returns it as 0

from collections import namedtuple
from random import shuffle

Card = namedtuple("Card", "suit, rank")

class Deck:
    suits = '♦♥♠♣'
    ranks = '23456789JQKA'

    def __init__(self):
        self.cards = [Card(suit, rank) for suit in self.suits for rank in self.ranks]
        shuffle(self.cards)

    def deal(self, amount):
        return tuple(self.cards.pop() for _ in range(amount))

flush = False
count = 0
while not flush:

    deck = Deck()
    stop = False
    while len(deck.cards) > 5:
        hand = deck.deal(5)
        # (Card(suit='♣', rank='7'), Card(suit='♠', rank='2'), Card(suit='♥', rank='4'), Card(suit='♥', rank='K'), Card(suit='♣', rank='3'))

        if len(set(card.suit for card in hand)) > 1:
            #print(f"No Flush: {hand}")
            continue
        print(f"Yay, it's a Flush: {hand}")
        flush = True
        break
        if flush:
            break
        else:
            count +=1
print(f'Count is {count}')

There is a little more code at the top used for init methods if you need that too let me know


Solution

  • Your code (and what is available in @Mason's answer) will estimate the probability of eventually getting your first flush. To estimate the probability of getting a flush in general, which I believe is what you're after, you have to run that experiment many thousands of times over. In practice this is called a Monte Carlo simulation.

    Side note: When I began learning about Monte Carlos I thought they were a sort of "magical", mysteriously complex thing... mostly because their name sounds so exotic. Don't be fooled. "Monte Carlo" is just an overly fancy and arbitrary name for "simulation". They can be quite elementary.

    Even so, simulations are kind of magical because you can use them to brute force a solution out of a complex system even when a mathematical model of that system is hard to come by. Say, for example, you don't have a firm understanding of combination or permutation math - which would produce the exact answer to your question "What are the odds of getting a flush?". We can run many simulations of your card game to figure out what that probability would be to a high degree of certainty. I've done that below (commented out parts of your original code that weren't needed):

    from collections import namedtuple
    from random import shuffle
    import pandas as pd
    
    #%% What is the likelyhood of getting flush? Mathematical derivation
    """ A flush consists of five cards which are all of the same suit.
    We must remember that there are four suits each with a total of 13 cards.
    Thus a flush is a combination of five cards from a total of 13 of the same suit.
    This is done in C(13, 5) = 1287 ways.
    Since there are four different suits, there are a total of 4 x 1287 = 5148 flushes possible. 
    Some of these flushes have already been counted as higher ranked hands.
    We must subtract the number of straight flushes and royal flushes from 5148 in order to
    obtain flushes that are not of a higher rank.
    There are 36 straight flushes and 4 royal flushes.
    We must make sure not to double count these hands.
    This means that there are 5148 – 40 = 5108 flushes that are not of a higher rank. 
    We can now calculate the probability of a flush as 5108/2,598,960 = 0.1965%.
    This probability is approximately 1/509. So in the long run, one out of every 509 hands is a flush."""
    "SOURCE: https://www.thoughtco.com/probability-of-a-flush-3126591"
    
    mathematically_derived_flush_probability = 5108/2598960 * 100
    
    #%% What is the likelyhood of getting flush? Monte Carlo derivation
    
    Card = namedtuple("Card", "suit, rank")
    
    class Deck:
        suits = '♦♥♠♣'
        ranks = '23456789JQKA'
    
        def __init__(self):
            self.cards = [Card(suit, rank) for suit in self.suits for rank in self.ranks]
            shuffle(self.cards)
    
        def deal(self, amount):
            return tuple(self.cards.pop() for _ in range(amount))
    
    #flush = False
    hand_count = 0
    flush_count = 0
    flush_cutoff = 150 # Increase this number to run the simulation over more hands.
    column_names = ['hand_count', 'flush_count', 'flush_probability', 'estimation_error']
    hand_results = pd.DataFrame(columns=column_names)
    
    while flush_count < flush_cutoff:
        deck = Deck()
        while len(deck.cards) > 5:
            hand_count +=1
            hand = deck.deal(5)
            # (Card(suit='♣', rank='7'), Card(suit='♠', rank='2'), Card(suit='♥', rank='4'), Card(suit='♥', rank='K'), Card(suit='♣', rank='3'))
            if len(set(card.suit for card in hand)) == 1:
    #            print(f"Yay, it's a Flush: {hand}")
                flush_count +=1
    #            break
    #        else:
    #            print(f"No Flush: {hand}")
            monte_carlo_derived_flush_probability = flush_count / hand_count * 100
            estimation_error = (monte_carlo_derived_flush_probability - mathematically_derived_flush_probability) / mathematically_derived_flush_probability * 100
            hand_df = pd.DataFrame([[hand_count,flush_count,monte_carlo_derived_flush_probability, estimation_error]], columns=column_names)
            hand_results = hand_results.append(hand_df)
    
    #%% Analyze results
    # Show how each consecutive hand helps us estimate the flush probability
    hand_results.plot.line('hand_count','flush_probability').axhline(y=mathematically_derived_flush_probability,color='r')
    
    # As the number of hands (experiments) increases, our estimation of the actual probability gets better.
    # Below the error gets closer to 0 percent as the number of hands increases.
    hand_results.plot.line('hand_count','estimation_error').axhline(y=0,color='black')
    
    #%% Memory usage
    print("Memory used to store all %s runs: %s megabytes" % (len(hand_results),round(hand_results.memory_usage(index=True,deep=True).sum()/1000000, 1)))
    

    In this particular case, thanks to math, we could have just confidently derived the probability of getting a flush as 0.1965%. To prove that our simulation arrived at the correct answer we can compare its output after 80,000 hands:

    enter image description here

    As you can see, our simulated flush_probability (in blue) approaches the mathematically derived probability (in black).

    Similarly, below is a plot of the estimation_error between the simulated probability and the mathematically derived value. As you can see, the estimation error was more than 100% off in the early runs of the simulation but gradually rose to within 5% of the error.

    enter image description here

    If you were to run the simulation for, say, twice the number of hands, then we would see that the blue and red lines eventually overlap with the black horizontal line in both charts - signifying that the simulated answer becomes equivalent to the mathematically derived answer.

    To simulate or not to simulate?

    Finally, you might wonder,

    "If I can generate a precise answer to a problem by simulating it, then why bother with all the complicated math in the first place?"

    The answer is, as with just about any decision in life, "trade offs".

    In our example, we could run the simulation over enough hands to get a precise answer with a high degree of confidence. However, if one is running a simulation because they don't know the answer (which is often the case), then one needs to answer another question,

    "How long do I run the simulation to be confident I have the right answer?"

    The answer to that seems simple:

    "Run it for a long time."

    Eventually your estimated outputs could converge to a single value such that outputs from additional simulations don't drastically change from prior runs. The problem here is that in some cases, depending on the complexity of the system you're simulating, seemingly convergent output may be a temporary phenomena. That is, if you ran a hundred thousand more simulations, you might begin to see your outputs diverge from what you thought was your stable answer. In a different scenario, despite having run tens of millions of simulations, it could happen that an output still hasn't converged. Do you have the time to program and run the simulation? Or would a mathematical approximation get you there sooner?

    There is yet another concern:

    *"What is the cost?"

    Consumer computers are relatively cheap today but 30 years ago they cost $4,000 to $9,000 in 2019 dollars. In comparison, a TI89 only cost $215 (again, in 2019 dollars). So if you were asking this question back in 1990 and you were good with probability math, you could have saved $3,800 by using a TI89. Cost is just as important today: simulating self-driving cars and protein folding can burn many millions of dollars.

    Finally, mission critical applications may require both a simulation and a mathematical model to cross check the results of both approaches. A tidy example of this is when Matt Parker of StandUpMaths calculated the odds of landing on any property in the game of Monopoly by simulation and confirmed those results with Hannah Fry's mathematical model of the same game.