Search code examples
rperformancevectorizationconvolutionlog-likelihood

Vectorizing S4 Class in R


I am trying to compute a log-likelihood, where the likelihood is a convolution of two distributions, and the distribution parameters depend on the value of that data point. More formally, I believe my data follows a likelihood which is the sum of two gamma distributions, which have separate shape and scale parameters and the shape parameter is a function of data. Written mathematically, yi=pGamma(k1+b1 xi, t1)+(1-p)Gamma(k2+b2 xi, t2). Note the plus sign is an arithmetic operation on the random variables, not between the CDFs. This is not a mixture model. The likelihood is a convolution between weighted random variables that have to be done per observation, which is very slow.

I currently am using the distr package, and this is what I have to compute the likelihood.

  sapply(1:10000, function(i){distr::d(p*Gammad(shape = k1+b1*x[i], scale = t1)+(1-p)*Gammad(shape = k2+b2*x[i], scale = t2))(y[i])})

Is there a way to vectorize this? Is there any other way to speed up the code?

When I try

distr::d(Vectorize(p*Gammad(shape = k1+b1*x, scale = t1)+(1-p)*Gammad(shape = k2+b2*x, scale = t2)))(y)

I get the error "Error in .multm(e1, e2, "AbscontDistribution") : length of operator must be 1"


Solution

  • First, note that if

    A ~ Gamma(k1 + b1*xi, t1)
    B ~ Gamma(k2 + b2*xi, t2)
    

    then

    p*A ~ Gamma(k1 + b1*xi, p*t1)
    (1 - p)*B ~ Gamma(k2 + b2*xi, (1 - p)*t2)
    

    The closed-form PDF of the sum of two gamma random variates is known and involves Kummer's (confluent hypergeometric) function.

    Using chf_1F1 from the scModels package, I believe the following function would return the log-likelihoods.

    library(scModels)
    
    LL <- function(k1, b1, t1, k2, b2, t2, x, p, y) {
      a1 <- k1 + b1*x
      a2 <- k2 + b2*x
      a <- a1 + a2
      b1 <- 1/(p*t1)
      b2 <- 1/((1 - p)*t2)
      a1*log(b1) + a2*log(b2) - lgamma(a) - b1*y + (a - 1)*log(y) + chf_1F1((b1 - b2)*y, a2, a)
    }