Suppose I have the following data set (named data
).
id var1 var2
1 A 33
2 B 23
3 A 45
4 A 55
5 B 22
6 A 33
7 B 90
8 A 78
9 B 12
10 A 11
My intention is to add a new column to the original data set that indicates whether each row of data set is randomly selected or not (1/0). I tried the following.
library(sampling)
data1 <- strata(data,"var1", size=c(4,3),method="srswor") #stratified random sampling
data2 <- getdata(data,data1) # this gives a separate data set
Any help, please? Thanks!
If you look in the documentation of sampling::strata()
you'll find the following information:
The function produces an object, which contains the following information:
ID_unit
the identifier of the selected units.
Stratum
the unit stratum.
Prob
the unit inclusion probability.
ID_Unit can used to subset the original data and assign the boolean you asked for:
data<-structure(list(id=c(1,2,3,4,5,6,7,8,9,10),var1=c("A",
"B","A","A","B","A","B","A","B","A"),var2=c(33,23,
45,55,22,33,90,78,12,11)),row.names=c(NA,-10L),class=c("tbl_df",
"tbl","data.frame"))
library(sampling)
data1 <- strata(data,"var1", size=c(4,3),method="srswor") #stratified random sampling
data2 <- getdata(data,data1) # this gives a separate data set
data$sampled <- FALSE
data[data1$ID_unit, "sampled"] <- TRUE
data
#> id var1 var2 sampled
#> 1 1 A 33 FALSE
#> 2 2 B 23 TRUE
#> 3 3 A 45 FALSE
#> 4 4 A 55 TRUE
#> 5 5 B 22 FALSE
#> 6 6 A 33 TRUE
#> 7 7 B 90 TRUE
#> 8 8 A 78 TRUE
#> 9 9 B 12 TRUE
#> 10 10 A 11 TRUE
Created on 2020-07-28 by the reprex package (v0.3.0)