Search code examples
rconditional-statementsrowprobabilityrowwise

Perform a random binomial draw for each row in R without rowwise()


I have an R data frame that I need to perform a random binomial draw for each row. The n = argument in the random binomial draw will be based on a value in a column of that row. Further, this operation should be within a case_when() based upon a conditional in the data.

Note: R's rowwise() function in tidyverse is much too slow, the data frame is too large and is being performed at each timestep in a simulation model. Is there a way to quickly and efficiently do this?

Example:

library(tidyverse)

df = data.frame(condition = c("A","B","A","B","C"),
                number = c(1000,1000,1000,1000,1))
prob1 = 0.000517143
prob2 = 0.000213472


set.seed(1)
df = df %>% 
  mutate(output = case_when(condition == "A" ~ sum(rbinom(n = number,
                                                          size = 1,
                                                          prob = prob1)),
                            condition == "B" ~ sum(rbinom(n = number,
                                                          size = 1,
                                                          prob = prob2)),
                            TRUE ~ 0))
print(df)
#>   condition number output
#> 1         A   1000      0
#> 2         B   1000      0
#> 3         A   1000      0
#> 4         B   1000      0
#> 5         C      1      0

Here, it looks like the random binomial draws are being reused and returning all zeros.

For a check, here it is sampled repeatedly. Feasibly, the sum(df$output) should be around 2 each draw.

for(i in 1:10){
  df = df %>% 
    mutate(output = case_when(condition == "A" ~ sum(rbinom(n = number,
                                                            size = 1,
                                                            prob = prob1)),
                              condition == "B" ~ sum(rbinom(n = number,
                                                            size = 1,
                                                            prob = prob2)),
                              TRUE ~ 0))
  print(sum(df$output))}
#> [1] 0
#> [1] 0
#> [1] 0
#> [1] 0
#> [1] 0
#> [1] 0
#> [1] 0
#> [1] 0
#> [1] 0
#> [1] 0

Unsure of the way forward.


Solution

  • Why are you summing draws of size 1? Refer to Wikipedia:

    In probability theory and statistics, the binomial distribution with parameters n and p is the discrete probability distribution of the number of successes in a sequence of n independent experiments, each asking a yes–no question, and each with its own Boolean-valued outcome: success (with probability p) or failure (with probability q = 1 − p).

    Thus, you can sample once per row and don't need to sum. Since rbinom is fully vectorized, you don't need a loop.

    df <- merge(df, data.frame(condition = c("A", "B"),
                         prob = c(0.000517143, 0.000213472)), 
          by = "condition", all.x = TRUE)
    df[is.na(df$prob), "prob"] <- 0
    
    set.seed(1)
    df$output <- with(df, rbinom(length(number), size = number, prob = prob)) 
    
    #  condition number        prob output
    #1         A   1000 0.000517143      0
    #2         A   1000 0.000517143      0
    #3         B   1000 0.000213472      0
    #4         B   1000 0.000213472      1
    #5         C      1 0.000000000      0