Search code examples
rdataframesampling

Stratified sampling on factor


I have a dataset of 1000 rows with the following structure:

     device geslacht leeftijd type1 type2
1       mob        0       53     C     3
2       tab        1       64     G     7
3        pc        1       50     G     7
4       tab        0       75     C     3
5       mob        1       54     G     7
6        pc        1       58     H     8
7        pc        1       57     A     1
8        pc        0       68     E     5
9        pc        0       66     G     7
10      mob        0       45     C     3
11      tab        1       77     E     5
12      mob        1       16     A     1

I would like to make a sample of 80 rows, composed of 10 rows with type1 = A, 10 rows with type1 = B, and so on. Is there anyone who can help he?


Solution

  • Base R solution:

    do.call(rbind,
            lapply(split(df, df$type1), function(i)
              i[sample(1:nrow(i), size = 10, replace = TRUE),]))
    

    EDIT:

    Other solutions suggested by @BrodieG

    with(DF, DF[unlist(lapply(split(seq(type), type), sample, 10, TRUE)), ])
    
    with(DF, DF[c(sapply(split(seq(type), type), sample, 10, TRUE)), ])