Search code examples
pandashistogramseries

Are there functions to retrieve the histogram counts of a Series in pandas?


There is a method to plot Series histograms, but is there a function to retrieve the histogram counts to do further calculations on top of it?

I keep using numpy's functions to do this and converting the result to a DataFrame or Series when I need this. It would be nice to stay with pandas objects the whole time.


Solution

  • If your Series was discrete you could use value_counts:

    In [11]: s = pd.Series([1, 1, 2, 1, 2, 2, 3])
    
    In [12]: s.value_counts()
    Out[12]:
    2    3
    1    3
    3    1
    dtype: int64
    

    You can see that s.hist() is essentially equivalent to s.value_counts().plot().

    If it was of floats an awful hacky solution could be to use groupby:

    s.groupby(lambda i: np.floor(2*s[i]) / 2).count()