Search code examples
rdataframefor-loopsample

Repeating Samples and Adding them to a Dataframe


I have a list of names. I am trying to take repeated ($n = 1000$) samples from the names, and add them to a dataframe in R.

names <- c("A", "B", "3", "4", "5", "6", "7", "8", "9", "10")
df <- data.frame(names)

for(i in 1:1000) {
  output <- sample(names, size = 10, replace = F)
  df <- mutate(df, output)
}

Unfortunately, I only get one of the output columns instead of 1000. What could I do to fix this?


Solution

  • You may want to use cbind or similar, like so. Also setNames is needed to avoid duplicated column names.

    set.seed(42)
    for(i in 1:5) {
      output <- sample(names, size=length(names), replace=F)
      df <- setNames(cbind.data.frame(df, output), c(names(df), paste0("output", i)))
    }
    df
    #    names output1 output2 output3 output4 output5
    # 1      A       A       8       9       3       5
    # 2      B       5       7      10       A       4
    # 3      3      10       4       3       B       B
    # 4      4       8       A       4       6       8
    # 5      5       B       5       5      10       3
    # 6      6       4      10       6       8       A
    # 7      7       6       B       A       4      10
    # 8      8       9       6       B       5       7
    # 9      9       7       9       8       7       6
    # 10    10       3       3       7       9       9
    

    Or, since R is vectorized, better do this w/o loop, because it's faster and more concise:

    set.seed(42)
    R <- 5
    cbind(df, `colnames<-`(replicate(R, sample(names)), paste0("output", 1:R)))
    #    names output1 output2 output3 output4 output5
    # 1      A       A       8       9       3       5
    # 2      B       5       7      10       A       4
    # 3      3      10       4       3       B       B
    # 4      4       8       A       4       6       8
    # 5      5       B       5       5      10       3
    # 6      6       4      10       6       8       A
    # 7      7       6       B       A       4      10
    # 8      8       9       6       B       5       7
    # 9      9       7       9       8       7       6
    # 10    10       3       3       7       9       9
    

    Note: I use `colnames<-` here, which is the matrix equivalent of setNames. You also could type cbind(df, setNames(replicate(R, sample(names), simplify=FALSE), paste0("output", 1:R))), though, but it's more to type.