I have a data frame DF
that looks like this:
Min Max
10 80
20 90
40 120
I want to append a new Random
column with a randomly generated number between the min and max values per row. The sampling of the number shall be derived from an upside down normal distribution and exclude the middle values like illustrated below
The below prototype code with single variables seem to work but I'm stuck with how to apply this row-wise.
min <- 1
max <- 20
q <- min + (max-min)*rbeta(10000, 0.5, 0.5)
q <- q[!(q > 5 & q < 15)][1:10000]
hist(q)
You could try this iterative approach. Use the variables lower
and upper
to define the excluded middle range.
Start by creating a column of NA
values. For each iteration of the loop, all NA
values in the column are overwritten with samples from your distribution. The samples that are within the excluded zone are then overwritten with NA
and the loop repeats until no NA
values are left in the column.
DF <- data.frame(Min = c(10, 20, 40), Max = c(80, 90, 120))
lower <- 5
upper <- 15
DF$sample <- rep(NA, nrow(DF));
while(any(is.na(DF$sample)))
{
i <- which(is.na(DF$sample));
DF$sample[i] <- DF$Min[i] + (DF$Max[i] - DF$Min[i]) * rbeta(length(i), 0.5, 0.5);
DF$sample[DF$sample > lower & DF$sample < upper] <- NA;
}
DF
#> Min Max sample
#> 1 10 80 31.88867
#> 2 20 90 33.26248
#> 3 40 120 80.08321
Created on 2020-02-18 by the reprex package (v0.3.0)