Search code examples
pythonscipysympybayesianpymc3

How do I implement maximum likelihood estimation type 2?


I am trying to implement an empirical bayesian ML-II(Maximum likelihood estimation Type II)method for estimating prior distribution parameters from historical data

Where:

  1. π(θ) is an expression for the prior distribution
  2. p(x|θ) is is an expression for the data distribution
  3. m(x) is an expression for the marginal distribution

According to the steps, I need to first integrate to find the expression of the marginal distribution, and then find the extreme value of this expression to estimate the parameters of the prior distribution. Extreme values can be achieved using methods such as scipy.optimize. So the question is how do we integrate this?

enter image description here


Solution

  • Here's a go at this using symfit. As an example I choose to sample from a bivariate normal distribution with no covariance.

    import numpy as np
    import matplotlib.pyplot as plt
    from symfit import Model, Fit, Parameter, Variable, integrate, oo
    from symfit.distributions import Gaussian
    from symfit.core.objectives import LogLikelihood
    
    # Make variables and parameters
    x = Variable('x')
    y = Variable('y')
    m = Variable('m')
    x0 = Parameter('x0', value=0.6, min=0.5, max=0.7)
    sig_x = Parameter('sig_x', value=0.1)
    y0 = Parameter('y0', value=0.7, min=0.6, max=0.9)
    sig_y = Parameter('sig_y', value=0.05)
    
    pdf = Gaussian(x=x, mu=x0, sig=sig_x) * Gaussian(x=y, mu=y0, sig=sig_y)
    marginal = integrate(pdf, (y, -oo, oo), conds='none')
    print(pdf)
    print(marginal)
    
    model = Model({m: marginal})
    
    # Draw 10000 samples from a bivariate distribution
    mean = [0.59, 0.8]
    cov = [[0.11**2, 0], [0, 0.23**2]]
    xdata, ydata = np.random.multivariate_normal(mean, cov, 10000).T
    
    # We provide only xdata to the model
    fit = Fit(model, xdata, objective=LogLikelihood)
    fit_result = fit.execute()
    print(fit_result)
    
    xaxis = np.linspace(0, 1.0)
    plt.hist(xdata, bins=100, density=True)
    plt.plot(xaxis, model(x=xaxis, **fit_result.params).m)
    plt.show()
    

    This prints the following for the pdf and the marginal distribution:

    >>> exp(-(-x0 + x)**2/(2*sig_x**2))*exp(-(-y0 + y)**2/(2*sig_y**2))/(2*pi*Abs(sig_x)*Abs(sig_y))
    >>> sqrt(2)*sig_y*exp(-(-x0 + x)**2/(2*sig_x**2))/(2*sqrt(pi)*Abs(sig_x)*Abs(sig_y))
    

    And for the fit results:

    Parameter Value        Standard Deviation
    sig_x     1.089585e-01 7.704533e-04
    sig_y     5.000000e-02 nan
    x0        5.905688e-01 -0.000000e+00
    Fitting status message: b'CONVERGENCE: REL_REDUCTION_OF_F_<=_FACTR*EPSMCH'
    Number of iterations:   9
    Regression Coefficient: nan
    

    enter image description here

    You can see that x0 and sig_x have been obtained properly, but no information can be obtained about the parameter to do with y. I think that makes sense in this example since there is no correlation, but I'll leave you to struggle with those kinds of details ;).