Search code examples
rprobability

Monty Hall game in R with base functions


Just for fun and to train R, I tried to proof the Monty Hall Game rule (changing your choice after one gate opened gives you more probability to win), I made this reproducible code (The explanation of every step is within the code):

## First I set the seed

set.seed(4)

## Then I modelize the presence of the prize as a random variable between gates 1,2,3


randomgates <- ceiling(runif(10000, min = 0, max = 3))

## so do I with the random choice.

randomchoice <- ceiling(runif(10000, min = 0, max = 3))

## As the opening of a gate is dependent from the gate you chose (the gate you chose cannot be opened)
## I modelize the opening of the gate as a variable which cannot be equal to the choice.

options <- c(1:3)

randomopen <- rep(1,10000)

for (i in 1:length(randomgates)) {
  realoptions <- options[options != randomchoice[i]]
  randomopen[i] <- realoptions[ceiling(runif(1,min = 0, max = 2))]
}

##Just to make data more easy to handle, I make a dataset

dataset <- cbind(randomgates, randomchoice, randomopen)

## Then I creat a dataset which only keeps the realization of the games in which we carry on (
## the opened gate wasn't the one with the price within)

steptwo <- dataset[randomopen != randomgates,]

## The next step is just to check if the probability of carry on is 2/3, which indeed is

carryon <- randomopen != randomgates

sum(carryon)/length(randomgates) 

## I format the dataset as a data frame

steptwo <- as.data.frame(steptwo)

## Now we check what happens if we hold our initial choice when game carries on

prizesholding <- steptwo$randomgates == steptwo$randomchoice

sum(prizesholding)

## creating a vector of changing option, dependant on the opened gate, in the dataset that
## keeps only the cases in which we carried on playing (the opened gate wasn't the one with the prize)

switchedchoice <- rep(1,length(steptwo$randomgates)) 

for (i in 1:length(steptwo$randomgates)) {
  choice <- options[options != steptwo$randomchoice[i]]
  switchedchoice[i] <- choice[ceiling(runif(1,min = 0, max = 2))]
}

## Now we check how many times you guess the prize gate when you switch your initial choice

prizesswitching <- steptwo$randomgates == switchedchoice

sum(prizesswitching)/length(steptwo$randomgates)

When I check the probability without changing my initial choice in the cases in which the game carried on (the gate opening didn't match the one with the prize) I obtain what I exepected (close 1/3 of probability of winning the prize), which refers to the following instruction:

carryon <- randomopen != randomgates

sum(carryon)/length(randomgates) 

My problem arises when I check the probability of winning the prize after changing my choice (conditionate, obviously to not having opened the door which holds the prize), instead of getting 1/2 as Monty Hall states, I get 1/4, it refers to the following instruction:

prizesswitching <- steptwo$randomgates == switchedchoice

sum(prizesswitching)/length(steptwo$randomgates)

I know that I am doing something bad because it is already more than proofed that Monty Hall holds, but I am not able to detect the flaw. Does anyone know what it is?

If you don't know what Monty Hall problem is, you can find easy-to-read information at wikipedia:

Monty Hall Game

Edit: As @Dason pointed out, one of the problem was I was introducing some kind of randomness in the changing of the initial choice, which doesn't makes sense as there is only one option left.

Other problem was that I was not approaching the problem under the assumption of Monty Hall knowing where the prize is. I changed my code from the initial to this, and the problem is solved:

# Prepare each variable for 10000 experiments

## First I set the seed

set.seed(4)

## Then I modelize the presence of the prize as a random variable between gates 1,2,3


randomgates <- ceiling(runif(10000, min = 0, max = 3))

## so do I with the random choice.

randomchoice <- ceiling(runif(10000, min = 0, max = 3))

## As the opening of a gate is dependent from the gate you chose (the gate you chose cannot be opened
##, neither the one with the prize does), I modelize the opening of the gate as a variable which cannot be equal to the choice.

options <- c(1:3)

randomopen <- rep(1,10000)

for (i in 1:length(randomgates)) {
  randomopen[i] <- options[options != randomchoice[i] & options != randomgates[i]]
}

##Just to make data more easy to handle, I make a dataset

dataset <- cbind(randomgates, randomchoice, randomopen)

## I format the dataset as a data frame

steptwo <- as.data.frame(dataset)

## Now we check what happens if we hold our initial choice when game carries on

steptwo$prizesholding <- steptwo$randomgates == steptwo$randomchoice

with(steptwo, sum(prizesholding))

## creating a vector of changing option, dependant on the opened gate, in the dataset that
## keeps only the cases in which we carried on playing (the opened gate wasn't the one with the prize)

steptwo$switchedchoice <- rep(1,length(steptwo$randomgates)) 

for (i in 1:length(steptwo$randomgates)) {
  steptwo$switchedchoice[i] <- options[options != steptwo$randomchoice[i] & options != steptwo$randomopen[i]]
}

## Now we check how many times you guess the prize gate when you switch your initial choice

steptwo$prizesswitching <- steptwo$randomgates == steptwo$switchedchoice

with(steptwo, sum(prizesswitching)/length(randomgates))

Solution

  • This seems to do the trick:

    n_iter <- 10000
    
    set.seed(4)
    
    doors <- 1:3
    prizes <- sample.int(n = 3, size = n_iter, replace = TRUE)
    your_pick <- sample.int(n = 3, size = n_iter, replace = TRUE)
    open_door <- rep(0, n_iter)
    switched_door <- rep(0, n_iter)
    
    for (i in 1:n_iter) {
      remaining_choices <- setdiff(doors, c(your_pick[i], prizes[i]))
    
      if (length(remaining_choices) > 1) {
        open_door[i] <- sample(remaining_choices, size = 1)
      } else {
        open_door[i] <- remaining_choices
      }
    
      switched_door[i] <- setdiff(doors, c(your_pick[i], open_door[i]))
    }
    
    > mean(your_pick == prizes)
    [1] 0.3305
    > mean(switched_door == prizes)
    [1] 0.6695
    

    The sample.int and sample base functions help simplify things a bit. The remaining_choices item contains the possible doors that can be opened by the game show host, which has a length of 1 or 2 depending on your original choice. If the length is 2, we sample from that vector, and if it's 1, that door is automatically opened.