Search code examples
rprobabilitymean

Probability of a natural 21 in blackjack (using R) is dropping all combinations of "10 diamonds", "10 hearts", "10 spades", etc. WHY?


I am trying to use R to find the probability that I get a natural 21 in Blackjack (i.e., one ace card and one face card).

Here is my code:

library(tidyverse)
library(gtools)

## step 1: build a deck of cards

suit<-c("diamonds","clubs","hearts","spades")

number<-c("ace","2", "3", "4", "5", "6", "7", "8", "9", "10", "jack", "queen", "king")

expand<-expand.grid(number=number, suit=suit)

deckofcards<-paste(expand$number,expand$suit)

## step 2: build a vector to be compared against

aces <- paste("ace", suit)
facecard <- c("king", "queen", "jack", "10")
facecard <- expand.grid(num = facecard, suit = suit)
facecard <- paste(facecard$num, facecard$suit)

Here is what aces and facecard looks like:

enter image description here

## find probability of one aces and one facecard

hands <- combinations(52, 2, v=deckofcards) # all possible hands

mean(hands[,1] %in% aces & hands[,2] %in% facecard) # this is the probability

My code produces a probability of 0.0361991, but it should be 0.04826546. I figured out that there is a problem in the last line that is failing to read "10 diamonds", "10 hearts", "10 spades", "10 clubs". You can see it here:

## find the problem: there should be 64 obs

hands_df<-as.data.frame(hands)
ggg<- hands_df %>% filter(hands[,1] %in% aces & hands[,2] %in% facecard)

You will see that "ggg" has 48 observations, while it should have 64. Something is omitting all the 16 rows or combinations that have "10 diamonds", "10 hearts", etc.

If I replace "10" in the number vector and facecard to "ten" then it works fine and gives me the right probability and dataframe (64 obs).

But I don't understand why the character "10" is not being accepted, and where in the code it is not being accepted? Both "10" and "ten" are exactly the same thing: characters. So what is the problem here?


Solution

  • MrFlick is correct. You are assuming the combination will be in a particular order, but it may not be. It is true that the output of cominations is ordered already, but a facecard char value can be lexographically greater or less than an ace card, so some pairs will have the ace first and some will have the facecard first. You still have to check both (or compare against a set of pre-ordered pairs)

    '10 diamonds' > 'ace diamonds'
    # [1] FALSE
    'king diamonds' > 'ace diamonds'
    # [1] TRUE
    

    Here you can see that 'ace diamonds' can be in either of the two columns, one side for '10 diamonds' and anohter for 'king diamonds', as you would expect based on the comparisons above.

    hands %>% 
      as.data.frame %>% 
      filter(if_any(everything(), str_detect, 'ace'),
             if_all(everything(), str_detect, 'diamonds'))
    
    #              V1             V2
    # 1   10 diamonds   ace diamonds
    # 2    2 diamonds   ace diamonds
    # 3    3 diamonds   ace diamonds
    # 4    4 diamonds   ace diamonds
    # 5    5 diamonds   ace diamonds
    # 6    6 diamonds   ace diamonds
    # 7    7 diamonds   ace diamonds
    # 8    8 diamonds   ace diamonds
    # 9    9 diamonds   ace diamonds
    # 10 ace diamonds  jack diamonds
    # 11 ace diamonds  king diamonds
    # 12 ace diamonds queen diamonds
    

    Changing the code to check both gives the expected result

    mean((hands[,1] %in% aces & hands[,2] %in% facecard) |
           (hands[,2] %in% aces & hands[,1] %in% facecard)) # this is the probability
    # [1] 0.04826546
    

    If you're familiar with the multivariate hypergeometric distribution, you could also skip a whole lot of coding by using a package that has a function for that

    library(extraDistr)
    
    dmvhyper(c(1, 1, 0), (52/13)*c(1, 4, 8), 2)
    # [1] 0.04826546