I'm using holoviews with bokeh backend for interactive visualizations. I have a histogram with edges and frequency data. What is an elegant way of overlaying my histogram with the cumulative distribution (cdf) curve?
I tried using the cumsum
option in hv.dim
but don't think i'm doing it right. The help simply says,
Help on function cumsum in module holoviews.util.transform:
cumsum(self, **kwargs)
My code looks something like,
df_hist = pd.DataFrame(columns=['edges', 'freq'])
df_hist['edges'] = [-2, -1, 0, 1, 2]
df_hist['freq'] = [1, 3, 5, 3, 1]
hv.Histogram((df_hist.edges, df_hist.freq))
The result is a histogram plot.
Is there something like a...
hv.Histogram((df_hist.edges, df_hist.freq), type='cdf')
... to show the cumulative distribution?
One possible solution is by using histogram(cumulative=True) as follows:
from holoviews.operation import histogram
histogram(hv.Histogram((df_hist.edges, df_hist.freq)), cumulative=True)
More info on transforming elements here:
http://holoviews.org/user_guide/Transforming_Elements.html
Or a more general solution by turning the original data into a hv.Dataset():
import holoviews as hv
import seaborn as sns
hv.extension('bokeh')
iris = sns.load_dataset('iris')
hv_data = hv.Dataset(iris['petal_width'])
histogram(hv_data, cumulative=True)
But I like using library hvplot, which is built on top of Holoviews, even more:
import hvplot
import hvplot.pandas
iris['petal_width'].hvplot.hist(cumulative=True)