I struggle with the following task: I need to generate data from a truncated normal distribution. The sample mean and standard deviation should match exactly those specified in the population. This is what I have so far:
mean <- 100
sd <- 5
lower <- 40
upper <- 120
n <- 100
data <- as.numeric(mean+sd*scale(rtnorm(n, lower=40, upper=120)))
The sample that's created takes on exactly the mean and sd specified in the population. But some values exceed the intended bounds. Any idea how to fix this? I was thinking of just cutting off all values outside these bounds, but then mean and sd don't resemble those of the population anymore.
You could use an iterative answer. Here I add samples one by one to the vector, but only if the resulting scaled dataset remains within the boundaries that you set. It takes longer, but it works:
n <- 10000
mean <- 100
sd <- 15
lower <- 40
upper <- 120
data <- rtnorm(1, lower=((lower - mean)/sd), upper=((upper - mean)/sd))
while (length(data) < n) {
sample <- rtnorm(1, lower=((lower - mean)/sd), upper=((upper - mean)/sd))
data_copy = c(data, sample)
data_copy_scaled = mean + sd * scale(data_copy)
if (min(data_copy_scaled) >= lower & max(data_copy_scaled) <= upper) {
data = c(data, sample)
scaled_data = as.numeric(mean + sd * scale(data))
Min. 1st Qu. Median Mean 3rd Qu. Max.
40.38 91.61 104.35 100.00 111.28 120.00
Below my old answer, which doesn't quite work
How about scaling the lower and upper limits of rtnorm
with the mean and sd that you want?
n <- 1000000
mean <- 100
sd <- 5
data <- as.numeric(mean+sd*scale(rtnorm(n, lower=((40 - mean)/sd), upper=((120 - mean)/sd))))
Min. 1st Qu. Median Mean 3rd Qu. Max.
76.91 96.63 100.00 100.00 103.37 120.00
In this case, even with a sample of 1000000 you get the exact mean and sd, and the max and min values remain within your boundaries.