Search code examples

Add dummy column to flag as the row is randomly selected or not

Suppose I have the following data set (named data).

id var1 var2
1   A   33
2   B   23
3   A   45
4   A   55
5   B   22
6   A   33
7   B   90
8   A   78
9   B   12
10  A   11

My intention is to add a new column to the original data set that indicates whether each row of data set is randomly selected or not (1/0). I tried the following.

data1 <- strata(data,"var1", size=c(4,3),method="srswor") #stratified random sampling
data2 <- getdata(data,data1)  # this gives a separate data set

Any help, please? Thanks!


  • If you look in the documentation of sampling::strata() you'll find the following information:

    The function produces an object, which contains the following information:
    the identifier of the selected units.
    the unit stratum.
    the unit inclusion probability.

    ID_Unit can used to subset the original data and assign the boolean you asked for:

    data1 <- strata(data,"var1", size=c(4,3),method="srswor") #stratified random sampling
    data2 <- getdata(data,data1)  # this gives a separate data set
    data$sampled <- FALSE
    data[data1$ID_unit, "sampled"] <- TRUE                 
    #>    id var1 var2 sampled
    #> 1   1    A   33   FALSE
    #> 2   2    B   23    TRUE
    #> 3   3    A   45   FALSE
    #> 4   4    A   55    TRUE
    #> 5   5    B   22   FALSE
    #> 6   6    A   33    TRUE
    #> 7   7    B   90    TRUE
    #> 8   8    A   78    TRUE
    #> 9   9    B   12    TRUE
    #> 10 10    A   11    TRUE

    Created on 2020-07-28 by the reprex package (v0.3.0)