I have a dataframe
with numerical continuous values, I want to convert them into an ordinal value as a categorical feature. At the same time, when there is a numerical value that does not meet the boundaries, it is retuning as NaN
. But I want to assign a new label for those values.
My dataframe
:
a
0 200
1 10000000
2 60000
3 5000
4 2
5 700000
Here is what is tried:
df = pd.DataFrame({'a':[200,10000000,60000,5000,2,700000]})
bins = [0, 100, 1000, 10000, 50000, 100000, 1000000]
labels = [1, 2, 3, 4, 5, 6]
binned_out = pd.cut(df['a'], bins=bins, labels=labels)
binned_out
output:
0 2
1 NaN
2 5
3 3
4 1
5 6
Name: a, dtype: category
Categories (6, int64): [1 < 2 < 3 < 4 < 5 < 6]
Expected Output by retruning values NaN
as 0
:
0 2
1 0
2 5
3 3
4 1
5 6
Use cat.add_categories
with Series.fillna
:
binned_out = pd.cut(df['a'], bins=bins, labels=labels).cat.add_categories([0]).fillna(0)
print (binned_out)
0 2
1 0
2 5
3 3
4 1
5 6
Name: a, dtype: category
Categories (7, int64): [1 < 2 < 3 < 4 < 5 < 6 < 0]