Search code examples
rintegrationnumerical-methods

Best way to solve an integral including a nonparametric density and distribution


Suppose that I want to solve a function containing two integrals like (this is an example, the actual function is uglier)

enter image description here

where a and b are the boundaries, c and d are known parameters and f(x) and F(x) are the density and distribution of the random variable x. In my problem f(x) and F(x) are nonparametrically found, so that I know their values only for certain specific values of x. How would you set the integral?

I did:

# Create the data
val <- runif(300, min=1, max = 10) #use the uniform distribution
CDF <- (val - 1)/(10 - 1)
pdf <- 1 / (10 - 1)
data <- data.frame(val = val, CDF = CDF, pdf = pdf)

c = 2
d = 1

# Inner integral
integrand1 <- function(x) {
  i <- which.min(abs(x - data$val))
  FF <- data$CDF[i]
  ff <- data$pdf[i]
  (1 - FF)^(c/d) * ff
}

# Vectorize the inner integral
Integrand1 <- Vectorize(integrand1)

# Outer integral
integrand2 <- function(x){
  i <- which.min(abs(x - data$val))
  FF <- data$CDF[i]
  ff <- data$pdf[i]
  (quadgk(Integrand1, x, 10) / FF) * c * ff
}

# Vectorize the outer integral
Integrand2 <- Vectorize(integrand2)

# Solve
require(pracma)
quadgk(Integrand2, 1, 10)

The integral is extremely slow. Is there a better way to solve this? Thank you.

---------EDIT---------

In my problem the pdf and CDF are computed from a vector of values v as follows:

# Create the original data
v <- runif(300, min = 1, max = 10)
require(np)

# Compute the CDF and pdf
v.CDF.bw <- npudistbw(dat = v, bandwidth.compute = TRUE, ckertype = "gaussian")
v.pdf.bw <- npudensbw(dat = v, bandwidth.compute = TRUE, ckertype = "gaussian")

# Extend v on a grid (I add this step because the v vector in my data
# is not very large. In this way I approximate the estimated pdf and CDF
# on a grid)
val <- seq(from = min(v), to = max(v), length.out = 1000)
data <- data.frame(val)
CDF <- npudist(bws = v.CDF.bw, newdata = data$val, edat = data )
pdf <- npudens(bws = v.pdf.bw, newdata = data$val, edat = data )
data$CDF <- CDF$dist
data$pdf <- pdf$dens

Solution

  • Have you considered using approxfun?

    It takes vectors x and y and gives you a function that linearly interpolates between those. So for example, try

    x <- runif(1000)+runif(1000)+2*(runif(1000)^2)
    dx <- density(x)
    fa <- approxfun(dx$x,dx$y)
    curve(fa,0,2)
    fa(0.4) 
    

    You should be able to call it using your gridded evaluations. It may be faster than what you're doing (as well as more accurate)

    (edit: yes, as you say, splinefun should be fine if its fast enough for your needs)