Search code examples
pythonpandas

How to label or rename bin ranges in a series output from value count


In a series or df column, I want to count the number of values that fit within predefined bins (easy) and meaningfully label the bin values (problem).

import pandas as pd

data = [{'A': 1, 'B': "Jim"}, {'A': 5, 'B': "Jim"}, {'A': 2, 'B': "Bob"}, {'A': 3, 'B': "Bob"}]
df = pd.DataFrame(data)

mBins = [-1, 2, 4, 6]
mLabels = ["0-2", "3-4", "5-6"]

simple_VC = df["A"].value_counts(bins=mBins)
Out[25]:   # ugly bin values
(-1.001, 2.0]    2
(2.0, 4.0]       1
(4.0, 6.0]       1

# Wanted more meaningful bin values:
0-2    2
3-4    1
5-6    1

I've tried using pd.cut, which allows me to label the bins, but I'm not sure how to use this in a value count. I've also tried to rename, but I don't know how to specify values like (4.0, 6.0], which are neither text or non-text.

How do I label the binned value counts - if possible during the value count, and how to rename bin ranges?


Solution

  • The bins parameter is actually just a convenience that calls cut internally.

    You can therefor cut the values manually and use the labels parameter before value_counts:

    out = pd.cut(df['A'], mBins, labels=mLabels).value_counts()
    

    If you don't want to provide the labels, you could also infer them from the intervals and rename:

    from math import ceil
    
    out = (df['A'].value_counts(bins=mBins)
           .rename(lambda x: f'{ceil(x.left+1)}-{ceil(x.right)}')
          )
    

    NB. this is an example for integer values/bounds.

    Output:

    A
    0-2    2
    3-4    1
    5-6    1
    Name: count, dtype: int64