We can write a simple function that generates a random variable according to any user-specified distribution, including those with a varying number of distributional parameters:
random_variates <- function(n, distribution, ...){
distribution(n, ...)
}
set.seed(123)
# negative binomial, 2 parameters
random_variates(n = 5, distribution = rnbinom, mu = 5, size = 0.5)
# [1] 1 3 0 34 3
# Poisson, 1 parameter
random_variates(n = 5, distribution = rpois, lambda = 5)
# [1] 9 8 6 7 1
Conversely, we can also write a function that uses optim()
to estimate maximum likelihood estimates of the parameters given the data, but for each distribution there must be a distribution-specific log likelihood function:
# Log likelihood function for Negative Binomial
nb_ll <- function(params, data){
-sum(log(dnbinom(data, mu = params[1], size = params[2])))
}
# example data
z_vals <- c(5, 0, 0, 3, 0, 0, 7, 0, 0, 0, 1, 0, 0, 0, 0, 1, 2, 0, 0,
3, 2, 0, 0, 1, 2, 0, 0, 2, 3, 0, 0, 0, 2, 1, 0, 2, 0, 0)
#MLE estimates
optim(c(0.01, 0.01), nb_ll, data = z_vals)[["par"]]
# [1] 0.973937 0.476989
I would like to combine these two ideas and write a likelihood function that can be passed through optim()
and allows the user to specify the distribution, without the need for distribution-specific likelihood functions. Something like:
# this of course doesn't work
any_ll <- function(params, distribution, data, ...){
-sum(log(distribution(data, ...)))
}
optim(c(0.01), any_ll, data = z_vals, distribution = dpois)
optim(c(0.01, 0.01), any_ll, data = z_vals, distribution = dnbinom)
Is there any way to do this in R?
Here is one approach that can be considered
z_vals <- c(5, 0, 0, 3, 0, 0, 7, 0, 0, 0, 1, 0, 0, 0, 0, 1, 2, 0, 0,
3, 2, 0, 0, 1, 2, 0, 0, 2, 3, 0, 0, 0, 2, 1, 0, 2, 0, 0)
any_ll <- function(params, FUN, data)
{
text <- paste0("-sum(log(FUN(data", paste0(",", params, collapse = ""), ")))")
eval(parse(text = text))
}
optim(c(0.01), any_ll, data = z_vals, FUN = dpois)
$par
[1] 0.9735
$value
[1] 60.83354
$counts
function gradient
40 NA
$convergence
[1] 0
$message
NULL
optim(c(1, 1), any_ll, data = z_vals, FUN = dgamma)
$par
[1] 1.000000 1.027027
$value
[1] 36.98661
$counts
function gradient
181 NA
$convergence
[1] 0
$message
NULL