I want to evaluate the probability density of data I've simulated.
dnorm()
in the following way:dist_mean <- 10
dist_sd <- 0.2
prob_density_on_x_val <- dnorm(x = 9.9,
mean = dist_mean,
sd = dist_sd)
prob_density_on_x_val
[1] 1.760327
dist_mean <- 10
dist_sd <- 0.2
## simulate 100,000 values from the normal distribution,
## given specific mean and standard deviation.
set.seed(123)
random_vals <- rnorm(n = 100000,
mean = dist_mean,
sd = dist_sd)
hist(random_vals)
library("pracma")
trapz(random_vals)
random_vals
[1] 1000009
In this comment, @Glen_b says using ecdf()
is the way to calculate probability in a range between two x values "a" and "b": ecdf(b)-ecdf(a)
. However, something doesn't make sense, because:
cdf <- ecdf(random_vals)
range_density <- cdf(10.2)-cdf(9.7)
range_density
[1] 0.77358
How is it possible that the probability density on a point value (x=9.9) was 1.76, but for a range 9.7<x<10.2
it's smaller (0.77)? Both distributions (both the one defined with dnorm
and the one simulated with rnorm
) have the same mean and sd.
So I think I'm missing something fundamental, and would be grateful for any help. Overall, it seems like a very simple question, but I can't find a straightforward solution despite a lot of reading and digging.
Thanks!
The thing I was missing was the distinction between:
dnorm()
is useful for)You can get the probability density function using the functions density
and approxfun
.
DensityFunction = approxfun(density(random_vals), rule=2)
DensityFunction(9.7)
[1] 0.6410087
plot(DensityFunction, xlim=c(9,11))
You can get the area under the curve using integrate
AreaUnderCurve = function(lower, upper) {
integrate(DensityFunction, lower=lower, upper=upper) }
AreaUnderCurve(10,11)
0.5006116 with absolute error < 6.4e-05
AreaUnderCurve(9.5,10.5)
0.9882601 with absolute error < 0.00011
You also ask:
How is it possible that that the probability density on point value (x=9.9) was 1.76, but for a range 9.7
The value of the pdf (1.76) is the height of the curve. The value you are getting for the range is the area under the curve. Since the width of the interval is 0.5, it is not a surprise that the area under the curve is less than the height.