Search code examples
pythonpandassortingbinning

Sort data ranges with pandas.cut


I try to understand how to create a table of data I have divided into bins using pandas.cut where the data ranges are in the right order. Using the following code to generate random ages:

import numpy as np
import pandas as pd
ages = np.random.standard_normal(1000)*20+30
ages[ages<0]=0
ages[ages>120]=120

I bin the data using this line:

ages = pd.Series(ages, dtype=int)
ages_cut = pd.cut(ages,[0,20,40,60,80,100,120])

However, when I use ages_cut.value_counts() I get a table with the age ranges in a wrong order:

(20, 40]      379
(0, 20]       268
(40, 60]      233
(60, 80]       56
(80, 100]       3
(100, 120]      0
dtype: int64

Solution

  • In addition of the comment of @QuangHoang, you can use value_counts with a bins parameter:

    bins : int, optional

    Rather than count values, group them into half-open bins, a convenience for pd.cut, only works with numeric data.

    >>> ages.value_counts(bins=[0,20,40,60,80,100,120], sort=False)
    (-0.001, 20.0]    334
    (20.0, 40.0]      382
    (40.0, 60.0]      224
    (60.0, 80.0]       54
    (80.0, 100.0]       6
    (100.0, 120.0]      0
    dtype: int64