Search code examples
rstatisticsnormal-distributionintegral

Integration error using R to evaluate overlapping distributions


This question is a follow up to the original thread (https://stats.stackexchange.com/questions/12209/percentage-of-overlapping-regions-of-two-normal-distributions)

I've modified the original code from the thread above to label the plots, but nothing much more. I'm running into issues with the code working with one set of inputs, but not another. As an R beginner, I'm just looking for help as to why.

mu1 <- 5
sd1 <- 1

mu2 <- 3
sd2 <- .75

This data set works.

Working Probability of fit

When I change the numbers, I hit an issue:

mu1 <- .3439
sd1 <- .0005

mu2 <- .3420
sd2 <- .00075

There's clearly an overlap, but the integral is not computing.

Failed Probability of fit

How is the integral failing when the numbers change? and how can I remedy? Do I not have enough x points in xs for the integral to compute?

Full Code below:

min.f1f2 <- function(x, mu1, mu2, sd1, sd2) {
  f1 <- dnorm(x, mean=mu1, sd=sd1)
  f2 <- dnorm(x, mean=mu2, sd=sd2)
  pmin(f1, f2)
}


mu1 <- .3439
sd1 <- .0005

mu2 <- .3420
sd2 <- .00075

xs <- seq(min(mu1 - 3*sd1, mu2 - 3*sd2), max(mu1 + 3*sd1, mu2 + 3*sd2), .00001)
f1 <- dnorm(xs, mean=mu1, sd=sd1)
f2 <- dnorm(xs, mean=mu2, sd=sd2)

plot(xs, f1, type="l", ylim=c(0, max(f1,f2)), ylab="density")
lines(xs, f2, lty="dotted")
ys <- min.f1f2(xs, mu1=mu1, mu2=mu2, sd1=sd1, sd2=sd2)
xs <- c(xs, xs[1])
ys <- c(ys, ys[1])
polygon(xs, ys, col="red")


### integrate to find % overlap
iP <- integrate(min.f1f2, -Inf, Inf, mu1=mu1, mu2=mu2, sd1=sd1, sd2=sd2)
VaLue <- iP$value
VaLue <- sprintf("%.1f %%", 100*VaLue)

#percentage on plot (in middle of overlap.. half of ys height )
text(((mu2 + 3*sd2)+(mu1 - 3*sd1))/2, max(ys)/2, VaLue)
#label f1
text(mu1+sd1,max(f1),"HOLE")
#label f2
text(mu2+sd2,max(f2),"PEG")

Solution

  • I found that for whatever reason integration fails for small numbers when approaching infinity (or even much above or below the 3 sigma tails...).

    I updated only the limits on the integral from -inf, inf to min(mu1 - 3*sd1, mu2 - 3*sd2), max(mu1 + 3*sd1, mu2 + 3*sd2) and I have an approximation again. Sure, it's not perfect, but I really only care about a handful of digits for my application.

    Would really like to better understand why this failed in the first place when only the scale changed. Any ideas?

    Updated code below.

    min.f1f2 <- function(x, mu1, mu2, sd1, sd2) {
      f1 <- dnorm(x, mean=mu1, sd=sd1)
      f2 <- dnorm(x, mean=mu2, sd=sd2)
      pmin(f1, f2)
    }
    
    #HOLE 
    mu1 <- .3439
    sd1 <- .0005
    #Peg
    mu2 <- .3420
    sd2 <- .00075
    
    xs <- seq(min(mu1 - 3*sd1, mu2 - 3*sd2), max(mu1 + 3*sd1, mu2 + 3*sd2), .00001)
    f1 <- dnorm(xs, mean=mu1, sd=sd1)
    f2 <- dnorm(xs, mean=mu2, sd=sd2)
    
    plot(xs, f1, type="l", ylim=c(0, max(f1,f2)), ylab="density")
    lines(xs, f2, lty="dotted")
    ys <- min.f1f2(xs, mu1=mu1, mu2=mu2, sd1=sd1, sd2=sd2)
    xs <- c(xs, xs[1])
    ys <- c(ys, ys[1])
    polygon(xs, ys, col="red")
    
    
    ### integrate to find % overlap
    iP <- integrate(min.f1f2, min(mu1 - 3*sd1, mu2 - 3*sd2), max(mu1 + 3*sd1, mu2 + 3*sd2), mu1=mu1, mu2=mu2, sd1=sd1, sd2=sd2)
    VaLue <- iP$value
    VaLue <- sprintf("%.1f %%", 100*VaLue)
    
    #percentage on plot (in middle of overlap.. )
    text(((mu2 + 3*sd2)+(mu1 - 3*sd1))/2, max(ys)/3.5, VaLue)
    #label f1
    text(mu1+sd1,max(f1),"HOLE")
    #label f2
    text(mu2+sd2,max(f2),"PEG")