Search code examples
pythonscipystatisticsstatsmodelsscientific-computing

Fitting empirical distribution against a hyperbolic distribution using scipy.stats


At the moment, I am fitting empirical distributions against theoretical ones as explained in Fitting empirical distribution to theoretical ones with Scipy (Python)?

Using the scipy.stats distributions, the results show a good fit for the hyperbolic secant distribution.

Here's my current approach using some of scipys distributions:

# -*- coding: utf-8 -*-

import pandas as pd
import numpy as np
import scipy.stats
import matplotlib.pyplot as plt


# Sample data with random numbers of hypsecant distribution
data = scipy.stats.hypsecant.rvs(size=8760, loc=1.93, scale=7.19)

# Distributions to check
dist_names = ['gausshyper', 'norm', 'gamma', 'hypsecant']

for dist_name in dist_names:

    dist = getattr(scipy.stats, dist_name)

    # Fit a distribution to the data
    param = dist.fit(data)

    # Plot the histogram
    plt.hist(data, bins=100, normed=True, alpha=0.8, color='g')

    # Plot and save the PDF
    xmin, xmax = plt.xlim()
    x = np.linspace(xmin, xmax, 100)
    p = dist.pdf(x, *param[:-2], loc=param[-2], scale=param[-1])
    plt.plot(x, p, 'k', linewidth=2)
    title = 'Distribution: ' + dist_name
    plt.title(title)
    plt.savefig('fit_' + dist_name + '.png')
    plt.close()

which delivers plots like the following:

enter image description here

But I would like to test the fit against a (generalized) hyperbolic distribution as well since I have the assumptions that it might deliver an even better fit.

Is there a hyperbolic distribution in scipy.stats that I can use? Or is there any workaround?

Using other packages would also be an option.

Thanks in advance!


Solution

  • As your distribution is not in scipy.stats you can either add it to the package or try doing things "by hand".

    For the former have a look at the source code of the scipy.stats package - it might not be all that much work to add a new distribution!

    For the latter option you can use a maximum Likelihood approach. To do so define first a method giving you the pdf of the distribution. Based on the pdf construct a function calculating the log likelihood of the data given specific parameters of the distribution. Finally fit your model to the data by maximizing this log likelihood function using scipy.optimize.