This question is a follow up to the original thread (https://stats.stackexchange.com/questions/12209/percentage-of-overlapping-regions-of-two-normal-distributions)
I've modified the original code from the thread above to label the plots, but nothing much more. I'm running into issues with the code working with one set of inputs, but not another. As an R beginner, I'm just looking for help as to why.
mu1 <- 5
sd1 <- 1
mu2 <- 3
sd2 <- .75
This data set works.
When I change the numbers, I hit an issue:
mu1 <- .3439
sd1 <- .0005
mu2 <- .3420
sd2 <- .00075
There's clearly an overlap, but the integral is not computing.
How is the integral failing when the numbers change? and how can I remedy? Do I not have enough x points in xs for the integral to compute?
Full Code below:
min.f1f2 <- function(x, mu1, mu2, sd1, sd2) {
f1 <- dnorm(x, mean=mu1, sd=sd1)
f2 <- dnorm(x, mean=mu2, sd=sd2)
pmin(f1, f2)
}
mu1 <- .3439
sd1 <- .0005
mu2 <- .3420
sd2 <- .00075
xs <- seq(min(mu1 - 3*sd1, mu2 - 3*sd2), max(mu1 + 3*sd1, mu2 + 3*sd2), .00001)
f1 <- dnorm(xs, mean=mu1, sd=sd1)
f2 <- dnorm(xs, mean=mu2, sd=sd2)
plot(xs, f1, type="l", ylim=c(0, max(f1,f2)), ylab="density")
lines(xs, f2, lty="dotted")
ys <- min.f1f2(xs, mu1=mu1, mu2=mu2, sd1=sd1, sd2=sd2)
xs <- c(xs, xs[1])
ys <- c(ys, ys[1])
polygon(xs, ys, col="red")
### integrate to find % overlap
iP <- integrate(min.f1f2, -Inf, Inf, mu1=mu1, mu2=mu2, sd1=sd1, sd2=sd2)
VaLue <- iP$value
VaLue <- sprintf("%.1f %%", 100*VaLue)
#percentage on plot (in middle of overlap.. half of ys height )
text(((mu2 + 3*sd2)+(mu1 - 3*sd1))/2, max(ys)/2, VaLue)
#label f1
text(mu1+sd1,max(f1),"HOLE")
#label f2
text(mu2+sd2,max(f2),"PEG")
I found that for whatever reason integration fails for small numbers when approaching infinity (or even much above or below the 3 sigma tails...).
I updated only the limits on the integral from -inf, inf to min(mu1 - 3*sd1, mu2 - 3*sd2), max(mu1 + 3*sd1, mu2 + 3*sd2) and I have an approximation again. Sure, it's not perfect, but I really only care about a handful of digits for my application.
Would really like to better understand why this failed in the first place when only the scale changed. Any ideas?
Updated code below.
min.f1f2 <- function(x, mu1, mu2, sd1, sd2) {
f1 <- dnorm(x, mean=mu1, sd=sd1)
f2 <- dnorm(x, mean=mu2, sd=sd2)
pmin(f1, f2)
}
#HOLE
mu1 <- .3439
sd1 <- .0005
#Peg
mu2 <- .3420
sd2 <- .00075
xs <- seq(min(mu1 - 3*sd1, mu2 - 3*sd2), max(mu1 + 3*sd1, mu2 + 3*sd2), .00001)
f1 <- dnorm(xs, mean=mu1, sd=sd1)
f2 <- dnorm(xs, mean=mu2, sd=sd2)
plot(xs, f1, type="l", ylim=c(0, max(f1,f2)), ylab="density")
lines(xs, f2, lty="dotted")
ys <- min.f1f2(xs, mu1=mu1, mu2=mu2, sd1=sd1, sd2=sd2)
xs <- c(xs, xs[1])
ys <- c(ys, ys[1])
polygon(xs, ys, col="red")
### integrate to find % overlap
iP <- integrate(min.f1f2, min(mu1 - 3*sd1, mu2 - 3*sd2), max(mu1 + 3*sd1, mu2 + 3*sd2), mu1=mu1, mu2=mu2, sd1=sd1, sd2=sd2)
VaLue <- iP$value
VaLue <- sprintf("%.1f %%", 100*VaLue)
#percentage on plot (in middle of overlap.. )
text(((mu2 + 3*sd2)+(mu1 - 3*sd1))/2, max(ys)/3.5, VaLue)
#label f1
text(mu1+sd1,max(f1),"HOLE")
#label f2
text(mu2+sd2,max(f2),"PEG")