I have the following dataset
id1<-c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20)
status<-c(1,1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2)
df<-data.frame(id1,status)
In df
for 40% of my observations status
is '2'.
I am looking for a function to extract a sample of 10 observations from df
while maintaining the above proportion.
I have already seen stratified random sampling from data frame in R but it is not talking about the proportions.
You can try the stratified
function from my "splitstackshape" package:
library(splitstackshape)
stratified(df, "status", 10/nrow(df))
# id1 status
# 1: 5 1
# 2: 12 1
# 3: 2 1
# 4: 1 1
# 5: 6 1
# 6: 9 1
# 7: 16 2
# 8: 17 2
# 9: 18 2
# 10: 15 2
Alternatively, using sample_frac
from "dplyr":
library(dplyr)
df %>%
group_by(status) %>%
sample_frac(10/nrow(df))
Both of these would take a stratified sample proportional to the original grouping variable (hence the use of 10/nrow(df)
, or, equivalently, 0.5
).