I struggle with the following task: I need to generate data from a truncated normal distribution. The sample mean and standard deviation should match exactly those specified in the population. This is what I have so far:
mean <- 100
sd <- 5
lower <- 40
upper <- 120
n <- 100
library(msm)
data <- as.numeric(mean+sd*scale(rtnorm(n, lower=40, upper=120)))
The sample that's created takes on exactly the mean and sd specified in the population. But some values exceed the intended bounds. Any idea how to fix this? I was thinking of just cutting off all values outside these bounds, but then mean and sd don't resemble those of the population anymore.
You could use an iterative answer. Here I add samples one by one to the vector, but only if the resulting scaled dataset remains within the boundaries that you set. It takes longer, but it works:
n <- 10000
mean <- 100
sd <- 15
lower <- 40
upper <- 120
data <- rtnorm(1, lower=((lower - mean)/sd), upper=((upper - mean)/sd))
while (length(data) < n) {
sample <- rtnorm(1, lower=((lower - mean)/sd), upper=((upper - mean)/sd))
data_copy = c(data, sample)
data_copy_scaled = mean + sd * scale(data_copy)
if (min(data_copy_scaled) >= lower & max(data_copy_scaled) <= upper) {
data = c(data, sample)
}
}
scaled_data = as.numeric(mean + sd * scale(data))
summary(scaled_data)
Min. 1st Qu. Median Mean 3rd Qu. Max.
40.38 91.61 104.35 100.00 111.28 120.00
sd(scaled_data)
15
Below my old answer, which doesn't quite work
How about scaling the lower and upper limits of rtnorm
with the mean and sd that you want?
n <- 1000000
mean <- 100
sd <- 5
library(msm)
data <- as.numeric(mean+sd*scale(rtnorm(n, lower=((40 - mean)/sd), upper=((120 - mean)/sd))))
summary(data)
Min. 1st Qu. Median Mean 3rd Qu. Max.
76.91 96.63 100.00 100.00 103.37 120.00
sd(data)
5
In this case, even with a sample of 1000000 you get the exact mean and sd, and the max and min values remain within your boundaries.