I have a dataset with 54285 observations. What I need is to assign randomly 50% of the rows into another dataframe, 30% into another dataset, and the rest (20%) into another one. This should be done without duplicates. This is an example:
data<-data.frame(numbers=c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10))
data
1
2
3
4
5
6
7
8
9
10
df1
5
3
8
1
7
df2
2
4
9
df3
6
10
Multiply the ratio by number of rows in the dataset and split
the data to divide them in separate dataframes.
set.seed(123)
result <- split(data, sample(rep(1:3, nrow(data) * c(0.5, 0.3, 0.2))))
names(result) <- paste0('df', seq_along(result))
list2env(result, .GlobalEnv)
df1
# numbers
#1 1
#3 3
#7 7
#9 9
#10 10
df2
# numbers
#4 4
#5 5
#8 8
df3
# numbers
#2 2
#6 6
For large dataframes using sample
with prob
argument should work as well. However, note that this might not give you exact number of rows that you expect like the above rep
answer.
result <- split(data, sample(1:3, nrow(data), replace = TRUE, prob = c(0.5, 0.3, 0.2)))