r dataframe time-series simulation panel

How to make a normally distributed variable depend on entries and time in R?

I'm trying to generate a dataset of cross sectional time series to estimate uses of different models. In this dataset, I have a ID variable and time variable. I'm trying to add a normally distributed variable that depends on the two identifications. In other words, how do I create a variable that recongizes both ID and time in R? If my question appears uncertain, feel free to ask any questions. Thanks in advance.

df2 <- read.table(
text =
"Year,ID,H,
1,1,N(2.3),
2,1,N(2.3),
3,1,N(2.3),
1,2,N(0.1),
2,2,N(0.1),
3,2,N(0.1),
", sep = ",", header = TRUE)

Solution

Assuming that the data in the dataframe df looks like

ID	Time
1	1
1	2
1	3
1	4
2	1
2	2
2	3
2	4
3	1
3	2
3	3
3	4

you can generate a variable y that depends on ID and time as the sum of two random normal distributions (yielding another normal distribution) that depend on ID and time respectively:

set.seed(42)


df = data.frame(
  ID   = rep(1:4,   each=3),
  time = rep(1:3,   times=4)
)

df$y = rnorm(nrow(df), mean=df$ID,   sd=1+0.1*df$ID) + 
       rnorm(nrow(df), mean=df$time, sd=0.05*df$time)

# Output:
   ID time         y
1   1    1  3.438611
2   1    2  2.350953
3   1    3  4.379443
4   1    4  5.823339
5   2    1  3.470909
6   2    2  3.607005
7   2    3  6.447756
8   2    4  6.150432
9   3    1  6.608619
10  3    2  4.740341
11  3    3  7.670543
12  3    4 10.215574

Note that the underlying normal distributions depend on both ID and time. That is in contrast to your example table above where it looks like it solely depends on ID -- namely resulting in a single normal distribution per ID that is independent of the time variable.