Search code examples
rsampling

Matching vector values by records in a data frame in R


I have a vector of values r as follows:

 r<-c(1,3,4,6,7)

and a data frame df with 20 records and two columns:

 id<-c(1,2,3,4,5,6,7,8,9,10,11,12,13,1,4,15,16,17,18,19,20)
 freq<-c(1,3,2,4,5,6,6,7,8,3,3,1,6,9,9,1,1,4,3,7,7)
 df<-data.frame(id,freq)

Using the r vector I need to extract a sample of records (in the form of a new data frame) from df in a way that the freq values of the records, would be equal to the values I have in my r vector. Needless to say that if it finds multiple records with the same freq values it should randomly pick one of them. For instance one possible outcome can be:

   id     frequency
   12         1
   10         3
   4          4
   7          6
   8          7

I would be thankful if anyone could help me with this.


Solution

  • You could try data.table

    library(data.table)
    setDT(df)[freq %in% r,sample(id,1L) , freq]
    

    Or using base R

    aggregate(id~freq, df, subset=freq %in% r, FUN= sample, 1L)
    

    Update

    If you have a vector "r" with duplicate values and want to sample the data set ('df') based on the length of unique elements in 'r'

      r <-c(1,3,3,4,6,7)
      res <- do.call(rbind,lapply(split(r, r), function(x) {
               x1 <- df[df$freq %in% x,]
               x1[sample(1:nrow(x1),length(x), replace=FALSE),]}))
      row.names(res) <- NULL