Search code examples
rrandomassign

Assign a specific number of random rows into datasets in R


I have a dataset with 54285 observations. What I need is to assign randomly 50% of the rows into another dataframe, 30% into another dataset, and the rest (20%) into another one. This should be done without duplicates. This is an example:

data<-data.frame(numbers=c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10))
data
1
2
3
4
5
6
7
8
9
10
What I expect would be:

df1
5
3
8
1
7

df2
2
4
9

df3
6
10


Solution

  • Multiply the ratio by number of rows in the dataset and split the data to divide them in separate dataframes.

    set.seed(123)
    result <- split(data, sample(rep(1:3, nrow(data) * c(0.5, 0.3, 0.2))))
    names(result) <- paste0('df', seq_along(result))
    list2env(result, .GlobalEnv)
    
    df1
    
    #   numbers
    #1        1
    #3        3
    #7        7
    #9        9
    #10      10
    
    df2
    #  numbers
    #4       4
    #5       5
    #8       8
    
    df3
    #  numbers
    #2       2
    #6       6
    

    For large dataframes using sample with prob argument should work as well. However, note that this might not give you exact number of rows that you expect like the above rep answer.

    result <- split(data, sample(1:3, nrow(data), replace = TRUE, prob = c(0.5, 0.3, 0.2)))