I have two data sets. df_sample
contains some measured data param
across sites and quadrants (with replicates). I want to use this dataset to populate df
set.seed(111)
#This is the dataset I want to draw the sample from
site <- rep(c("1","2","3"), each = 20)
quad <- rep(c("1","2","3","4","5"), rep = 12)
param <- rnorm(60,5,1)
df_sample <- data.frame(site,quad, param)
#This is the dataset I want to add the sampling to
month <- rep(c("J","J","J","F","M"), each = 5)
site <- rep(c("1","2","3","1","2"), each = 5)
quad <- rep(c("1","2","3","4","5"), rep = 5)
df <- data.frame(month,site,quad)
Notice that the first dataset just has the sites where param was measured multiple times across various quadrants. Now, in df
I want to create a new column param. For each month and site, this param will randomly sample from only its corresponding site and quadrant. So essentially, each site and quadrant can take one of three values. How can I achieve this?
df$param <- sample(df_sample$param)
As an example `df` could look like this
month site quad param
J 1 1 4.8236
J 1 2 3.502
J 1 3 6.84
...
With data.table
:
setDT(df_sample)[,list(param=list(param)),by=list(site,quad)][
setDT(df),
on = c("site","quad")][,param:=sapply(param, sample, 1)][]
site quad param month
1: 1 1 4.826326 J
2: 1 2 3.502573 J
3: 1 3 6.845636 J
4: 1 4 5.394054 J
5: 1 5 4.506038 J
6: 2 1 1.886783 J
7: 2 2 4.058643 J
8: 2 3 5.334256 J
9: 2 4 4.840423 J
10: 2 5 5.326549 J
11: 3 1 4.152732 J
12: 3 2 5.331380 J
13: 3 3 6.805868 J
14: 3 4 7.485662 J
15: 3 5 5.741972 J
16: 1 1 5.140278 F
17: 1 2 4.593401 F
18: 1 3 6.845636 F
19: 1 4 5.394054 F
20: 1 5 5.797529 F
21: 2 1 1.886783 M
22: 2 2 4.883845 M
23: 2 3 5.189737 M
24: 2 4 4.379142 M
25: 2 5 2.734004 M