I have the following dataset
data = {"Amount":
["216.00","30.00","30.00","36.00","25.00","38.00","78.8","189.00","43.00","110.00"]}
dataset = pd.DataFrame(data)
I want to discretize these variables,create a new variable that has clear division of them into categores.
import pandas as pd
dataset["Discretized"] = pd.cut(x = dataset["Amount"],bins = [0,2,200,"Inf"], labels = ["Low",
"Medium", "Large"])
I get results that do not correspond to the discretization rules. For instance, 110 is labelled low whereas it should be labelled as medium. Same with 30, which should be labelled as medium.
Amount Discretized
216.00 Large
30.00 Large
30.00 Large
36.00 Large
25.00 Large
38.00 Large
78.8 Large
189.00 Low
43.00 Large
110.00 Low
How can I achieve my goal, and get back correct discretized values according to the boundaries in the bins argument ?
You should convert your column to a float
and use float("inf")
instead of "Inf"
:
import pandas as pd
df["Discretized"] = pd.cut(x=df["Amount"].astype(float), bins=[0,2,200,float('inf')], labels=["Low","Medium","Large"])
-----------------------------------------------
Amount Discretized
0 216.00 Large
1 30.00 Medium
2 30.00 Medium
3 36.00 Medium
4 25.00 Medium
5 38.00 Medium
6 78.8 Medium
7 189.00 Medium
8 43.00 Medium
9 110.00 Medium
-----------------------------------------------