Newbie here. My problem has 2 steps. I would like to sample a number of rows (3) from a data frame and then take a second sample (1 row) which is not in the first sample.
#here is my data frame
df = data.frame(matrix(rnorm(20), nrow=10))
#here is my first sample with 3 rows
sample_1<- df[sample(nrow(df), 3), ]
#here is my second sample
sample_2 <- df[sample(nrow(df), 1), ]
I want the second sample to not be a part of the first sample.
I appreciate your help. Thank you!
Hello! Thanks once again for the response to this. I have a follow up question to this. If I needed to run this on a large dataset, using a FOR loop, so that it ran the code for every iteration but selected a different group each time the loop ran, would that be possible?
@GregorThomas' suggestion is likely best, given what we know: sample four rows, and then take one row as your sample_2
and the rest are in sample_1
.
set.seed(42)
df <- data.frame(matrix(rnorm(20), nrow=10))
( samples <- sample(nrow(df), size = 4) )
# [1] 6 8 4 9
sample_1 <- df[ samples[-1], ]
sample_2 <- df[ samples[1],,drop = FALSE ]
sample_1
# X1 X2
# 8 -0.09465904 -2.6564554
# 4 0.63286260 -0.2787888
# 9 2.01842371 -2.4404669
sample_2
# X1 X2
# 6 -0.1061245 0.6359504
However, if for some reason your sampling requires something else, then you can restrict your second sampling to those not included in the first. A good way is if you have a unique id of some form in each row:
df$id <- seq_len(nrow(df))
df
# X1 X2 id
# 1 1.37095845 1.3048697 1
# 2 -0.56469817 2.2866454 2
# 3 0.36312841 -1.3888607 3
# 4 0.63286260 -0.2787888 4
# 5 0.40426832 -0.1333213 5
# 6 -0.10612452 0.6359504 6
# 7 1.51152200 -0.2842529 7
# 8 -0.09465904 -2.6564554 8
# 9 2.01842371 -2.4404669 9
# 10 -0.06271410 1.3201133 10
sample_1 <- df[sample(nrow(df), 3), ]
sample_1
# X1 X2 id
# 6 -0.1061245 0.6359504 6
# 2 -0.5646982 2.2866454 2
# 5 0.4042683 -0.1333213 5
subdf <- df[ !df$id %in% sample_1$id, ]
sample_2 <- subdf[sample(nrow(subdf), 1), ]
sample_2
# X1 X2 id
# 7 1.511522 -0.2842529 7