I'm trying to generate a dataset of cross sectional time series to estimate uses of different models. In this dataset, I have a ID variable and time variable. I'm trying to add a normally distributed variable that depends on the two identifications. In other words, how do I create a variable that recongizes both ID and time in R? If my question appears uncertain, feel free to ask any questions. Thanks in advance.
df2 <- read.table(
text =
"Year,ID,H,
1,1,N(2.3),
2,1,N(2.3),
3,1,N(2.3),
1,2,N(0.1),
2,2,N(0.1),
3,2,N(0.1),
", sep = ",", header = TRUE)
Assuming that the data in the dataframe df
looks like
ID | Time |
---|---|
1 | 1 |
1 | 2 |
1 | 3 |
1 | 4 |
2 | 1 |
2 | 2 |
2 | 3 |
2 | 4 |
3 | 1 |
3 | 2 |
3 | 3 |
3 | 4 |
you can generate a variable y
that depends on ID and time as the sum of two random normal distributions (yielding another normal distribution) that depend on ID
and time
respectively:
set.seed(42)
df = data.frame(
ID = rep(1:4, each=3),
time = rep(1:3, times=4)
)
df$y = rnorm(nrow(df), mean=df$ID, sd=1+0.1*df$ID) +
rnorm(nrow(df), mean=df$time, sd=0.05*df$time)
# Output:
ID time y
1 1 1 3.438611
2 1 2 2.350953
3 1 3 4.379443
4 1 4 5.823339
5 2 1 3.470909
6 2 2 3.607005
7 2 3 6.447756
8 2 4 6.150432
9 3 1 6.608619
10 3 2 4.740341
11 3 3 7.670543
12 3 4 10.215574
Note that the underlying normal distributions depend on both ID
and time
. That is in contrast to your example table above where it looks like it solely depends on ID
-- namely resulting in a single normal distribution per ID that is independent of the time variable.