Search code examples
pythonpython-3.xpandasdataframebinning

How to give label on pandas.cut() when a a value does not meet any boundaries


I have a dataframe with numerical continuous values, I want to convert them into an ordinal value as a categorical feature. At the same time, when there is a numerical value that does not meet the boundaries, it is retuning as NaN. But I want to assign a new label for those values.

My dataframe:

          a
0       200
1  10000000
2     60000
3      5000
4         2
5    700000

Here is what is tried:

df = pd.DataFrame({'a':[200,10000000,60000,5000,2,700000]})
bins = [0, 100, 1000, 10000, 50000, 100000, 1000000]
labels = [1, 2, 3, 4, 5, 6]
binned_out = pd.cut(df['a'], bins=bins, labels=labels)

binned_out output:

0      2
1    NaN
2      5
3      3
4      1
5      6
Name: a, dtype: category
Categories (6, int64): [1 < 2 < 3 < 4 < 5 < 6]

Expected Output by retruning values NaN as 0:

0      2
1      0
2      5
3      3
4      1
5      6

Solution

  • Use cat.add_categories with Series.fillna:

    binned_out = pd.cut(df['a'], bins=bins, labels=labels).cat.add_categories([0]).fillna(0)
    print (binned_out)
    0    2
    1    0
    2    5
    3    3
    4    1
    5    6
    Name: a, dtype: category
    Categories (7, int64): [1 < 2 < 3 < 4 < 5 < 6 < 0]