Search code examples
rnormal-distribution

Generate individual data distributions using mean and standard deviation data from a data frame in R


I have a data.frame in R, containing several categorical variables, each with its own mean and standard deviation. I want to generate values from a normal data distribution for each categorical variable defined by these values and generate individual data.frames for each discrete categorical variable.

Here's some dummy data

dummy_data <- data.frame(VARIABLE = LETTERS[seq( from = 1, to = 10 )],
                         MEAN = runif(10, 5, 10), SD = runif(10, 1, 3))

dummy_data

   VARIABLE     MEAN       SD
1         A 6.278751 1.937093
2         B 6.384247 2.487678
3         C 9.017496 2.003202
4         D 5.125994 1.829517
5         E 9.525213 1.914513
6         F 9.004893 2.734934
7         G 9.780757 2.511341
8         H 5.372160 1.510281
9         I 6.240331 2.796826
10        J 8.478280 2.325139

What I'd like to do from here, is to generate individual data.frames for each row, with each data.frame containing a normal distribution based on the MEAN and SD columns.

So, for example, I'd have a separate data.frame that contained....

A <- subset(dummy_data, VARIABLE == 'A')
A <- data.frame(rnorm(20,  A$MEAN, A$SD))

A

   rnorm.20..A.MEAN..A.SD.
1                 5.131331
2                 9.388104
3                 8.909453
4                 5.813257
5                 5.353137
6                 7.598521
7                 2.693924
8                 5.425703
9                 8.939687
10                9.148066
11                4.528936
12                7.576479
13                8.207456
14                6.838258
15                6.972061
16                7.824283
17                6.283434
18                4.503815
19                2.133388
20                7.472886

The real data I'm working with is much larger than ten rows, and so I don't want to subset the whole thing to generate the individual data.frames if I can help it.

Thanks in advance


Solution

  • Using data.table:

    library(data.table)
    result     <- setDT(dummy_data)[, .(sample=rnorm(20, mean=MEAN, sd=SD)), by=.(VARIABLE)]
    list.of.df <- split(result, result$VARIABLE)