Search code examples
rvectorconcatenationstring-matching

Why are these two R snippets providing differing probabilities?


Answered below; the disparity arises due to a sorting property in the 'combinations()' function.

I'm working in R, simulating a deck of cards for studying probability. When the deck is constructed with numeral strings ("10", "8", etc.), calculating the probability of a natural 21 in blackjack is different than when all of the strings are pure character strings ("Ten", "Eight", etc.).

Including library(gtools), here is the former code alluded to:

suits <- c("D", "C", "H", "S")
numbers <- c("2", "3", "4", "5", "6", "7", "8", "9", "10", 
             "J", "Q", "K", "A")
deck <- expand.grid(number = numbers, suit = suits)
deck <- paste(deck$number, deck$suit)
aces <- paste("A", suits)

facecard <- c("K", "Q", "J", "10")
facecard <- expand.grid(number=facecard, suit=suits)
facecard <- paste(facecard$number, facecard$suit)

hands <- combinations(52, 2, v = deck)


mean(hands[,1] %in% aces & hands[,2] %in% facecard)

Which yields:

mean(hands[,1] %in% aces & hands[,2] %in% facecard)
#0.0361991

Why does this change when using this code?

suits <- c("D", "C", "H", "S")
numbers <- c("Two", "Three", "Four", "Five", "Six", "Seven", "Eight", "Nine", 
             "Ten", "J", "Q", "K", "A")
deck <- expand.grid(number = numbers, suit = suits)
deck <- paste(deck$number, deck$suit)

aces <- paste("A", suits)

facecard <- c("K", "Q", "J", "Ten")
facecard <- expand.grid(number=facecard, suit=suits)
facecard <- paste(facecard$number, facecard$suit)

hands <- combinations(52, 2, v = deck)


mean(hands[,1] %in% aces & hands[,2] %in% facecard)

Yielding:

mean(hands[,1] %in% aces & hands[,2] %in% facecard)
#0.04826546

Why do these calculations differ in value when the (apparent) only difference is the use of pure character strings in the construction of the 'numbers' vector? Is the inclusion of "A" with the other numbers ("9", "8",. . .) causing a lower count when combining with the facecards?


Solution

  • Your code is wrong in both cases, but through a happy accident you still get the correct result once (0.048).

    In blackjack (and in your combinations() results) the order of the cards does not matter. Yet in your code you use this:

    mean(hands[,1] %in% aces & hands[,2] %in% facecard)
    

    which checks that the first card is an Ace and the second card is a facecard. Instead, your code should check that either card is an Ace and either card is a facecard, like this:

    mean(
      (hands[, 1] %in% aces | hands[, 2] %in% aces) & 
      (hands[, 1] %in% facecard | hands[, 2] %in% facecard)
    )
    

    If you use that code, you will get the correct result in both cases.

    Your code versions produce different results because gtools::combinations apparently sorts each row so the alphabetically first item is in the first column. When you spell out the numbers, "A D" comes before "Ten D", so your code was lucky that it looked for Aces in the first column and facecards in the second column, and that "A" alphabetically comes before all of the face card names. However, when you use "10 D", alphabetization rules put digits before letters, and the corresponding row then has "10 D" in the first column and "A D" in the second column, so the Ace 10 hands are missed.