In a series or df column, I want to count the number of values that fit within predefined bins (easy) and meaningfully label the bin values (problem).
import pandas as pd
data = [{'A': 1, 'B': "Jim"}, {'A': 5, 'B': "Jim"}, {'A': 2, 'B': "Bob"}, {'A': 3, 'B': "Bob"}]
df = pd.DataFrame(data)
mBins = [-1, 2, 4, 6]
mLabels = ["0-2", "3-4", "5-6"]
simple_VC = df["A"].value_counts(bins=mBins)
Out[25]: # ugly bin values
(-1.001, 2.0] 2
(2.0, 4.0] 1
(4.0, 6.0] 1
# Wanted more meaningful bin values:
0-2 2
3-4 1
5-6 1
I've tried using pd.cut, which allows me to label the bins, but I'm not sure how to use this in a value count. I've also tried to rename, but I don't know how to specify values like (4.0, 6.0], which are neither text or non-text.
How do I label the binned value counts - if possible during the value count, and how to rename bin ranges?
The bins
parameter is actually just a convenience that calls cut
internally.
You can therefor cut
the values manually and use the labels
parameter before value_counts
:
out = pd.cut(df['A'], mBins, labels=mLabels).value_counts()
If you don't want to provide the labels, you could also infer them from the intervals and rename
:
from math import ceil
out = (df['A'].value_counts(bins=mBins)
.rename(lambda x: f'{ceil(x.left+1)}-{ceil(x.right)}')
)
NB. this is an example for integer values/bounds.
Output:
A
0-2 2
3-4 1
5-6 1
Name: count, dtype: int64