Search code examples
rdatatablesample

Random sampling of two data tables with condition


I am trying to sample two data tables on a condition, then combine the columns of the two resulting samples and replicate the these steps and append the resulting samples in a new data table. Extract of the two tables (they do not have the sample length):

data1
   month1 year
1: 1    2014
2: 2    2015
3: 3    2016
..

data2
   month2    
1: 4   
2: 5    
3: 6   
..

first sample: s1 = sample(data1[month = i ], 100, replace=TRUE) where i goes from 1 to n

second sample: s2 = sample(data2[month > i ], 100, replace=TRUE) where i should be greater than the month selected for s1.

The two samples should be combined in a new data table like dt1 = cbind(s1,s2)

I want to repeat these steps for every month i and create a new data set with all the resulting samples (pseudo-code):

 for(i in 1:10){
s1_i  = sample(data1[month = i ], 100, replace=TRUE)
s2_i = sample(data2[month > i ], 100, replace=TRUE)
new_i = cbind(s1_i,s2_i)
 }
allsamples = rbind(new_1,new_2,new_3,...)

I have trouble writing this loop, it should not create data sets for every step, but create only the allsamples dataset, where all samples are combined.


Solution

  • Here is my solution:

      newsample =list()
      begin_time = 1 
      end_time = 20 
      for(i in  begin_time:end_time){
          datasub1 <-data1[data1$var == i,]  #filter data on condition
          s1 <-  datasub1[sample(nrow( datasub1), 10, replace=T), ]  #sample
          datasub2 <- data2[data2$var2 > i,]
          s2 <- datasub2[sample(nrow(datasub2), 10, replace=T), ]
          newsample[[i-(begin_time-1])] <- cbind(s1,s2) #combine and store in list
       }
     allsample = rbindlist(newsample) #stack samples as data table