At the moment, I am fitting empirical distributions against theoretical ones as explained in Fitting empirical distribution to theoretical ones with Scipy (Python)?
Using the scipy.stats distributions, the results show a good fit for the hyperbolic secant distribution.
Here's my current approach using some of scipys distributions:
# -*- coding: utf-8 -*-
import pandas as pd
import numpy as np
import scipy.stats
import matplotlib.pyplot as plt
# Sample data with random numbers of hypsecant distribution
data = scipy.stats.hypsecant.rvs(size=8760, loc=1.93, scale=7.19)
# Distributions to check
dist_names = ['gausshyper', 'norm', 'gamma', 'hypsecant']
for dist_name in dist_names:
dist = getattr(scipy.stats, dist_name)
# Fit a distribution to the data
param = dist.fit(data)
# Plot the histogram
plt.hist(data, bins=100, normed=True, alpha=0.8, color='g')
# Plot and save the PDF
xmin, xmax = plt.xlim()
x = np.linspace(xmin, xmax, 100)
p = dist.pdf(x, *param[:-2], loc=param[-2], scale=param[-1])
plt.plot(x, p, 'k', linewidth=2)
title = 'Distribution: ' + dist_name
plt.title(title)
plt.savefig('fit_' + dist_name + '.png')
plt.close()
which delivers plots like the following:
But I would like to test the fit against a (generalized) hyperbolic distribution as well since I have the assumptions that it might deliver an even better fit.
Is there a hyperbolic distribution in scipy.stats that I can use? Or is there any workaround?
Using other packages would also be an option.
Thanks in advance!
As your distribution is not in scipy.stats
you can either add it to the package or try doing things "by hand".
For the former have a look at the source code of the scipy.stats
package - it might not be all that much work to add a new distribution!
For the latter option you can use a maximum Likelihood approach. To do so define first a method giving you the pdf of the distribution. Based on the pdf construct a function calculating the log likelihood of the data given specific parameters of the distribution. Finally fit your model to the data by maximizing this log likelihood function using scipy.optimize
.