Search code examples
rappendsamplingreplicate

Replicate a procedure: generate m dataframes, each one with a variable that includes random values, and append them, in R


Which is the most efficient way to generate m dataframes, where each one has a variable that includes random values, and append them?

Here's an example:

df <- data.frame(id = 1:10, var = sample(1:500), 10, replace=TRUE)


id var

1  65
2  123
3  42
4  16
5  463
6  129
7  367
8  99
9  489
10 63

If m = 2, two dataframes should be generated and appended, having:

id var
    
 1  65
 2  123
 3  42
 4  16
 5  463
 6  129
 7  367
 8  99
 9  489
 10 63
 1  321
 2  410
 3  78
 4  166
 5  320
 6  478
 7  231
 8  100
 9  105
 10 206

Solution

  • Put the dataframe to be generated in a function

    fun <- function() {
      df <- data.frame(id = 1:10, var = sample(1:500, 10, replace=TRUE))  
      df
    }
    

    Then there are multiple ways to call this function m times and bind.

    Base R using replicate

    m <- 2
    do.call(rbind, replicate(m, fun(), simplify = FALSE))
    

    Base R using lapply

    do.call(rbind, lapply(seq_len(m),\(x) fun()))
    

    purrr::map_df

    purrr::map_df(seq_len(m), ~fun())