I'm trying to answer the following question using a simple Monte Carlo sampling procedure in R: An urn contains 10 balls. Two red, three white, and five black. All 10 are drawn one at a time without replacement. Find the probability that both the first and last balls drawn are black.
I've tried two approaches and neither work.
Here's the longer approach that's more intuitive to me:
balls <- c(1:10) #Consider 1-5 black, 6-8 white, and 9-10 red.
pick.ball <- function(balls){
sample(x = balls, 1, replace = FALSE)
}
experiment <- function(n){
picks = NULL
keep <- NULL
for(j in 1:n){
for(i in 1:10){
picks[i] <- pick.ball(balls = balls)
}
keep[j] <- ifelse(picks[1] == any(1:5) & picks[10] == any(1:5), 1, 0)
}
return(length(which(keep == 1))/n)
}
Here's my second, simpler approach which shows my lack of understanding of the repeat loop. Don't bother running it--it just goes on forever. But if someone could help me better understand why, that would be appreciated!
balls <- c(1:10) #Consider 1-5 black, 6-8 white, and 9-10 red.
pick.ball <- function(balls, n){
keep = NULL
for(i in 1:n){
picks <- sample(x = balls, 10, replace = FALSE)
keep[i] <- ifelse(picks[1] == any(1:5) & picks[10] == any(1:5), 1, 0)
repeat{
picks
if(length(keep) == n){
break
}
}
}
return(which(keep == 1)/n)
}
Here is a loop that I created. You can wrap it up in a function if you want. Instead of numbering the balls, I am using letters.
urn <- c(rep("B", 5), rep("W", 3), rep("R", 2))
# Set the number of times you want to run the loop
nloops <- 10000
# Create an empty data frame to hold the outcomes of the simulation
m <- structure(list(first = character(),
last = character(),
is_black = integer()),
class = "data.frame")
Now run the loop
set.seed(456)
for (j in 1:nloops) {
b <- sample(urn, 10, replace = FALSE)
m[j, 1:2 ] <- b[c(1, 10)]
m[j, 3] <- ifelse(m[j, 1] == "B" & m[j, 2] == "B", 1, 0)
}
head(m)
first last is_black
1 B W 0
2 B B 1
3 B B 1
4 R B 0
5 B R 0
6 R W 0
Finally, the answer:
# Proportion of cases where first and last ball drawn were black
sum(m[ , 3]) / nloops
# This was 0.22