I am trying to implement an empirical bayesian ML-II(Maximum likelihood estimation Type II)method for estimating prior distribution parameters from historical data
Where:
According to the steps, I need to first integrate to find the expression of the marginal distribution, and then find the extreme value of this expression to estimate the parameters of the prior distribution.
Extreme values can be achieved using methods such as scipy.optimize
. So the question is how do we integrate this?
Here's a go at this using symfit
. As an example I choose to sample from a bivariate normal distribution with no covariance.
import numpy as np
import matplotlib.pyplot as plt
from symfit import Model, Fit, Parameter, Variable, integrate, oo
from symfit.distributions import Gaussian
from symfit.core.objectives import LogLikelihood
# Make variables and parameters
x = Variable('x')
y = Variable('y')
m = Variable('m')
x0 = Parameter('x0', value=0.6, min=0.5, max=0.7)
sig_x = Parameter('sig_x', value=0.1)
y0 = Parameter('y0', value=0.7, min=0.6, max=0.9)
sig_y = Parameter('sig_y', value=0.05)
pdf = Gaussian(x=x, mu=x0, sig=sig_x) * Gaussian(x=y, mu=y0, sig=sig_y)
marginal = integrate(pdf, (y, -oo, oo), conds='none')
print(pdf)
print(marginal)
model = Model({m: marginal})
# Draw 10000 samples from a bivariate distribution
mean = [0.59, 0.8]
cov = [[0.11**2, 0], [0, 0.23**2]]
xdata, ydata = np.random.multivariate_normal(mean, cov, 10000).T
# We provide only xdata to the model
fit = Fit(model, xdata, objective=LogLikelihood)
fit_result = fit.execute()
print(fit_result)
xaxis = np.linspace(0, 1.0)
plt.hist(xdata, bins=100, density=True)
plt.plot(xaxis, model(x=xaxis, **fit_result.params).m)
plt.show()
This prints the following for the pdf and the marginal distribution:
>>> exp(-(-x0 + x)**2/(2*sig_x**2))*exp(-(-y0 + y)**2/(2*sig_y**2))/(2*pi*Abs(sig_x)*Abs(sig_y))
>>> sqrt(2)*sig_y*exp(-(-x0 + x)**2/(2*sig_x**2))/(2*sqrt(pi)*Abs(sig_x)*Abs(sig_y))
And for the fit results:
Parameter Value Standard Deviation
sig_x 1.089585e-01 7.704533e-04
sig_y 5.000000e-02 nan
x0 5.905688e-01 -0.000000e+00
Fitting status message: b'CONVERGENCE: REL_REDUCTION_OF_F_<=_FACTR*EPSMCH'
Number of iterations: 9
Regression Coefficient: nan
You can see that x0
and sig_x
have been obtained properly, but no information can be obtained about the parameter to do with y
. I think that makes sense in this example since there is no correlation, but I'll leave you to struggle with those kinds of details ;).