I have a dataframe A like this one:
SNP X Y
rs1 5 aa
rs2 1 bb
rs3 6 aa
rs4 2 bb
rs7 11 ft
rs8 3 hg
rs9 1.2 ff
rs10 2.2 cc
rs11 2.2 yh
rs362 3.2 hyu
Using R, I want to sample rows following 2 conditions: (1) keep all rows with values in X >= 5; (2) sample at random without replacement 2 rows with X > 0 and X < 5. I would get something like this:
SNP X Y
rs1 5 aa
rs2 1 bb
rs3 6 aa
rs7 11 ft
rs9 1.2 ff
rs362 3.2 hyu
I am trying something like:
A.1 = A[A$X >= 5,]
B.2 = A[sample(nrow(A), 2), ]
We can use the which
function:
set.seed(1) # reproducible
d[c(which(d$X >= 5), sample(which(d$X > 0 & d$X < 5), 2)),]
SNP X Y
1 rs1 5.0 aa
3 rs3 6.0 aa
5 rs7 11.0 ft
2 rs2 1.0 bb
7 rs9 1.2 ff
which(d$X >= 5)
finds the rows in your data where X >= 5
. Then, we find the rows where X > 0 & X < 5
using which
again, and sample
2 from those rows. We then concatenate these two vectors of row indexes together.
d <- structure(list(SNP = c("rs1", "rs2", "rs3", "rs4", "rs7", "rs8",
"rs9", "rs10", "rs11", "rs362"),
X = c(5, 1, 6, 2, 11, 3, 1.2,
2.2, 2.2, 3.2),
Y = c("aa", "bb", "aa", "bb", "ft", "hg", "ff",
"cc", "yh", "hyu")),
class = "data.frame",
row.names = c(NA, -10L))