I am trying to use R to find the probability that I get a natural 21 in Blackjack (i.e., one ace card and one face card).
Here is my code:
library(tidyverse)
library(gtools)
## step 1: build a deck of cards
suit<-c("diamonds","clubs","hearts","spades")
number<-c("ace","2", "3", "4", "5", "6", "7", "8", "9", "10", "jack", "queen", "king")
expand<-expand.grid(number=number, suit=suit)
deckofcards<-paste(expand$number,expand$suit)
## step 2: build a vector to be compared against
aces <- paste("ace", suit)
facecard <- c("king", "queen", "jack", "10")
facecard <- expand.grid(num = facecard, suit = suit)
facecard <- paste(facecard$num, facecard$suit)
Here is what aces and facecard looks like:
## find probability of one aces and one facecard
hands <- combinations(52, 2, v=deckofcards) # all possible hands
mean(hands[,1] %in% aces & hands[,2] %in% facecard) # this is the probability
My code produces a probability of 0.0361991, but it should be 0.04826546. I figured out that there is a problem in the last line that is failing to read "10 diamonds", "10 hearts", "10 spades", "10 clubs". You can see it here:
## find the problem: there should be 64 obs
hands_df<-as.data.frame(hands)
ggg<- hands_df %>% filter(hands[,1] %in% aces & hands[,2] %in% facecard)
You will see that "ggg" has 48 observations, while it should have 64. Something is omitting all the 16 rows or combinations that have "10 diamonds", "10 hearts", etc.
If I replace "10" in the number vector and facecard to "ten" then it works fine and gives me the right probability and dataframe (64 obs).
But I don't understand why the character "10" is not being accepted, and where in the code it is not being accepted? Both "10" and "ten" are exactly the same thing: characters. So what is the problem here?
MrFlick is correct. You are assuming the combination will be in a particular order, but it may not be. It is true that the output of cominations
is ordered already, but a facecard char value can be lexographically greater or less than an ace card, so some pairs will have the ace first and some will have the facecard first. You still have to check both (or compare against a set of pre-ordered pairs)
'10 diamonds' > 'ace diamonds'
# [1] FALSE
'king diamonds' > 'ace diamonds'
# [1] TRUE
Here you can see that 'ace diamonds'
can be in either of the two columns, one side for '10 diamonds'
and anohter for 'king diamonds'
, as you would expect based on the comparisons above.
hands %>%
as.data.frame %>%
filter(if_any(everything(), str_detect, 'ace'),
if_all(everything(), str_detect, 'diamonds'))
# V1 V2
# 1 10 diamonds ace diamonds
# 2 2 diamonds ace diamonds
# 3 3 diamonds ace diamonds
# 4 4 diamonds ace diamonds
# 5 5 diamonds ace diamonds
# 6 6 diamonds ace diamonds
# 7 7 diamonds ace diamonds
# 8 8 diamonds ace diamonds
# 9 9 diamonds ace diamonds
# 10 ace diamonds jack diamonds
# 11 ace diamonds king diamonds
# 12 ace diamonds queen diamonds
Changing the code to check both gives the expected result
mean((hands[,1] %in% aces & hands[,2] %in% facecard) |
(hands[,2] %in% aces & hands[,1] %in% facecard)) # this is the probability
# [1] 0.04826546
If you're familiar with the multivariate hypergeometric distribution, you could also skip a whole lot of coding by using a package that has a function for that
library(extraDistr)
dmvhyper(c(1, 1, 0), (52/13)*c(1, 4, 8), 2)
# [1] 0.04826546