I have a Series in Python and I'd like to fit a density to its histogram. Question: is there a slick way to use the values from np.histogram() to achieve this result? (see Update below)
My current problem is that the kde fit I perform has (seemingly) unwanted kinks, as depicted in the second plot below. I was hoping for a kde fit that is monotone decreasing based on a histogram, which is the first figure depicted. Below I've included my current code. Thanks in advance
import numpy as np
from matplotlib import pyplot as plt
from scipy.stats import gaussian_kde as kde
df[var].hist()
plt.show() # shows the original histogram
density = kde(df[var])
xs = np.arange(0, df[var].max(), 0.1)
ys = density(xs)
plt.plot(xs, ys) # a pdf with kinks
Alternatively, is there a slick way to use
count, div = np.histogram(df[var])
and then scale the count array to apply kde() to it?
Based on cel's comment below (should've been obvious, but I missed it!), I was implicitly under-binning in this case using the default params in pandas.DataFrame.hist(). In the updated plot I used
df[var].hist(bins=100)
I'll leave this post up in case others find it useful but won't mind if it gets taken down as 'too localized' etc.
The problem was under-binning as mentioned by cel, see comments above. It was clarifying to set bins=100 in pd.DataFrame.histo() which defaults to bins=10.
See also: http://en.wikipedia.org/wiki/Histogram#Number_of_bins_and_width