I have a data.frame
in R
, containing several categorical variables, each with its own mean and standard deviation. I want to generate values from a normal data distribution for each categorical variable defined by these values and generate individual data.frames
for each discrete categorical variable.
Here's some dummy data
dummy_data <- data.frame(VARIABLE = LETTERS[seq( from = 1, to = 10 )],
MEAN = runif(10, 5, 10), SD = runif(10, 1, 3))
dummy_data
VARIABLE MEAN SD
1 A 6.278751 1.937093
2 B 6.384247 2.487678
3 C 9.017496 2.003202
4 D 5.125994 1.829517
5 E 9.525213 1.914513
6 F 9.004893 2.734934
7 G 9.780757 2.511341
8 H 5.372160 1.510281
9 I 6.240331 2.796826
10 J 8.478280 2.325139
What I'd like to do from here, is to generate individual data.frames for each row, with each data.frame containing a normal distribution based on the MEAN
and SD
columns.
So, for example, I'd have a separate data.frame that contained....
A <- subset(dummy_data, VARIABLE == 'A')
A <- data.frame(rnorm(20, A$MEAN, A$SD))
A
rnorm.20..A.MEAN..A.SD.
1 5.131331
2 9.388104
3 8.909453
4 5.813257
5 5.353137
6 7.598521
7 2.693924
8 5.425703
9 8.939687
10 9.148066
11 4.528936
12 7.576479
13 8.207456
14 6.838258
15 6.972061
16 7.824283
17 6.283434
18 4.503815
19 2.133388
20 7.472886
The real data I'm working with is much larger than ten rows, and so I don't want to subset the whole thing to generate the individual data.frames
if I can help it.
Thanks in advance
Using data.table
:
library(data.table)
result <- setDT(dummy_data)[, .(sample=rnorm(20, mean=MEAN, sd=SD)), by=.(VARIABLE)]
list.of.df <- split(result, result$VARIABLE)