Search code examples
rfunctionstatisticsbayesianintegral

Why my final function doesn't integrate to "1" when "normalized" in R?


I have a coding question about integrate(). It arises from a few steps:

First, I have an Initial function called Like. Second, I get the integral of Like and call the outcome norm.cons. Third, I divide the Like by norm.cons and call the outcome Like.2. Finally, I get the integral of Like.2. (All R code is provided below.)

Question:

By definition, the answer of my final step above should be "1". But why instead I get the following answer? 4.573253e-12

Here is my R code:

Like = function(x) sapply(lapply(x, dnorm, x = seq(1, 30), 2), prod) # Initial function

norm.cons = integrate(Like, -Inf, Inf)[[1]] # Integral of Initial Function

Like.2 = function(x) sapply(lapply(x, dnorm, x = seq(1, 30), 2), prod) / norm.cons # Deviding the initial function by its Integral

integrate(Like.2, -Inf, Inf)[[1]] # HERE Why Integral is not "1" ?

Solution

  • Your norm.cons variable is 0, which means anything you put in Like.2 results in NaN (division by 0). Try, for example,

    Like.2(3) # results in NaN
    

    The reason norm.cons is 0 is that it is 0 at almost all values except those around 200-300, and numerical integration is too imprecise to find the region where it is not 0. For example, try:

    Like(220) # 3.471218e-244
    Like(300) # 2.661608e-296
    
    # outside of that region:
    Like(200) # 0
    Like(320) # 0
    

    To fix this, set more reasonable bounds around both of your integrations (it won't hurt the calculation since almost none of your function's mass occurs outside that region). For example:

    Like = function(x) sapply(lapply(x, dnorm, x = c(250, 265, 259), 2), prod) # Initial function
    
    norm.cons = integrate(Like, 220, 300)[[1]] # Integral of Initial Function
    
    Like.2 = function(x) sapply(lapply(x, dnorm, x = c(250, 265, 259), 2), prod) / norm.cons
    
    integrate(Like.2, 100, 500)[[1]]
    

    This results in 0.9999309, very close to your expected value of 1. The difference is due to the imprecision of numerical integration, as well as arithmetic underflow (some values aren't 0 but are too small for R to represent). If you play around with the bounds of your integrals a bit you can get it even closer to 1.