Search code examples
rrandomsample-data

Create 3 mutually exclusive samples in r


I have a data set that I need to split into three mutually exclusive random samples of different sizes. I used:

testdata<-sample(47959,14388,replace=FALSE,prob=NULL)

to try and create a sample (the data set size is 47959) but I don't know how to make this sample into a manipulatable data set in R.


Solution

  • Some data:

    set.seed(42)
    x <- sample(20, size=100, replace=TRUE)
    head(x)
    ## [1] 19 19  6 17 13 11
    

    Random Sizes

    Create an index of all 1:3, and use that to subset your data:

    i <- sample(1:3, size=length(x), replace=TRUE)
    head(i)
    ## [1] 2 1 1 2 3 3
    

    Now break it into the three groups (many ways to do this):

    x.grouped <- split(x, i)
    str(x.grouped)
    ## List of 3
    ##  $ 1: int [1:31] 19 6 15 20 9 5 8 9 18 20 ...
    ##  $ 2: int [1:30] 19 17 14 10 6 10 19 3 10 19 ...
    ##  $ 3: int [1:39] 13 11 15 3 15 19 12 2 8 19 ...
    

    The relative sizes of the three groups will vary randomly.

    Controlled sizes

    indices represents the size you desire in each group.

    indices <- c(20, 50, 30)
    indices.cs <- cumsum(indices)
    x.unsorted <- sample(x)
    xs.grouped.sized <- mapply(function(a,b) x.unsorted[a:b],
        1+lag(indices.cs, default=0),
        indices.cs,
        SIMPLIFY=FALSE)
    str(xs.grouped.sized)
    ## List of 3
    ##  $ : int [1:20] 2 7 13 1 19 7 14 20 19 1 ...
    ##  $ : int [1:50] 13 6 19 4 19 20 20 11 17 3 ...
    ##  $ : int [1:30] 1 10 7 16 9 16 17 11 14 8 ...
    

    Edit: Updated Implementation

    indices <- sample(rep(1:3, times = c(20,50,30)))
    str(split(x, indices))
    ## List of 3
    ##  $ 1: int [1:20] 6 3 10 6 10 20 17 8 5 13 ...
    ##  $ 2: int [1:50] 19 19 17 15 14 15 19 20 3 19 ...
    ##  $ 3: int [1:30] 13 11 15 19 10 12 3 11 14 1 ...