Search code examples
rloopsfor-loopdplyrsample

How to sample from another dataset based on factors?


I have two data sets. df_sample contains some measured data param across sites and quadrants (with replicates). I want to use this dataset to populate df

set.seed(111)

#This is the dataset I want to draw the sample from
site <- rep(c("1","2","3"), each = 20)
quad <- rep(c("1","2","3","4","5"), rep = 12)
param <- rnorm(60,5,1)

df_sample <- data.frame(site,quad, param)


#This is the dataset I want to add the sampling to
month <- rep(c("J","J","J","F","M"), each = 5)
site <- rep(c("1","2","3","1","2"), each = 5)
quad <- rep(c("1","2","3","4","5"), rep = 5)

df <- data.frame(month,site,quad)

Notice that the first dataset just has the sites where param was measured multiple times across various quadrants. Now, in df I want to create a new column param. For each month and site, this param will randomly sample from only its corresponding site and quadrant. So essentially, each site and quadrant can take one of three values. How can I achieve this?

df$param <- sample(df_sample$param)

As an example `df` could look like this

month site quad param
  J    1     1    4.8236
  J    1     2    3.502
  J    1     3    6.84
 ...

Solution

  • With data.table:

    setDT(df_sample)[,list(param=list(param)),by=list(site,quad)][
      setDT(df),
      on = c("site","quad")][,param:=sapply(param, sample, 1)][]
     
       site quad    param month
     1:    1    1 4.826326     J
     2:    1    2 3.502573     J
     3:    1    3 6.845636     J
     4:    1    4 5.394054     J
     5:    1    5 4.506038     J
     6:    2    1 1.886783     J
     7:    2    2 4.058643     J
     8:    2    3 5.334256     J
     9:    2    4 4.840423     J
    10:    2    5 5.326549     J
    11:    3    1 4.152732     J
    12:    3    2 5.331380     J
    13:    3    3 6.805868     J
    14:    3    4 7.485662     J
    15:    3    5 5.741972     J
    16:    1    1 5.140278     F
    17:    1    2 4.593401     F
    18:    1    3 6.845636     F
    19:    1    4 5.394054     F
    20:    1    5 5.797529     F
    21:    2    1 1.886783     M
    22:    2    2 4.883845     M
    23:    2    3 5.189737     M
    24:    2    4 4.379142     M
    25:    2    5 2.734004     M