Search code examples
rsample

Sample replication


I have a data frame (d) composed of 640 observations for 55 variables.

I would like to randomly sample this data frame in 10 sub data frame of 64 observations for 55 variables. I don't want any of the observation to be in more than one sub data-frame.

This code work for one sample

d1 <- d[sample(nrow(d),64,replace=F),]

How can I repeat this treatment ten times ?

This one give me a data-frame of 10 variables (each one is one sample...)

d1 <- replicate(10,sample(nrow(d),64,replace = F))}

Can anyone help me?


Solution

  • Here's a solution that returns the result in a list of data.frames:

    d <- data.frame(A=1:640, B=sample(LETTERS, 640, replace=TRUE)) # an exemplary data.frame
    idx <- sample(rep(1:10, length.out=nrow(d)))
    res <- split(d, idx)
    res[[1]] # first data frame
    res[[10]] # last data frame
    

    The only tricky part involves creating idx. idx[i] identifies the resulting data.frame, idx[i] in {1,...,10}, in which the ith row of d will occur. Such an approach assures us that no row will be put into more than 1 data.frame.

    Also, note that sample returns a random permutation of (1,2,...,10,1,2,...,10).

    Another approach is to use:

    apply(matrix(sample(nrow(d)), ncol=10), 2, function(idx) d[idx,])