Search code examples
rsamplingrandom

How to do the Random sampling of of a dataset in R having the transaction ID should be together


My sample data set is like the following

transactionID   desc
1   a
1   d
1   a
2   c
2   d
3   l
3   g
3   h
5   h
5   b
5   h
5   f
6   d
7   f
7   v
7   f
8   f
8   d

The Sampling result should be

1   a
1   d
1   a
2   c
2   d
3   l
3   g
3   h

or

5   h
5   b
5   h
5   f
6   d
7   f
7   v
7   f
8   f
8   d

The exact sample values are not important , it can be anything but the important factor i have to keep is the same transaction id should in one sample. How can i do this ?


Solution

  • You can try

     n <- 2
     df[with(df, transactionID %in% 
             sample(unique(transactionID),n, replace=FALSE)),]
     #      transactionID desc
     #1              1    a
     #2              1    d
     #3              1    a
     #17             8    f
     #18             8    d
    

    data

     df <- structure(list(transactionID = c(1L, 1L, 1L, 2L, 2L, 3L, 3L, 
     3L, 5L, 5L, 5L, 5L, 6L, 7L, 7L, 7L, 8L, 8L), desc = c("a", "d", 
     "a", "c", "d", "l", "g", "h", "h", "b", "h", "f", "d", "f", "v", 
     "f", "f", "d")), .Names = c("transactionID", "desc"), class = "data.frame",
     row.names = c(NA,-18L))