I need to create categorical variables for RAM category.
Basic: RAM [0-4]
Intermediate: RAM [5-8]
Advanced: RAM [8-12]
Command:
df['Memory']=pd.cut(df['RAM '], [0,4,8,12], include_lowest=True, labels=['Basic','Intermediate', 'Advaced'])
Error:
TypeError Traceback (most recent call last)
<ipython-input-58-5c93d7c00ba2> in <cell line: 1>()
----> 1 df['Memory']=pd.cut(df['RAM '], [0,4,8,12], include_lowest=True, labels=['Basic', 'Intermediate', 'Advaced'])
1 frames
/usr/local/lib/python3.9/dist-packages/pandas/core/reshape/tile.py in _bins_to_cuts(x, bins, right, labels, precision, include_lowest, dtype, duplicates, ordered)
425
426 side: Literal["left", "right"] = "left" if right else "right"
--> 427 ids = ensure_platform_int(bins.searchsorted(x, side=side))
428
429 if include_lowest:
TypeError: '<' not supported between instances of 'int' and 'str'
Could you please help me to fix this? I'm new to Python.
It seems like you have numerical-like values in your column RAM
, so use to_numeric
:
df['Memory']= pd.cut(pd.to_numeric(df['RAM '], errors="coerce"), bins=[0,4,8,12],
include_lowest=True, labels=['Basic', 'Intermediate', 'Advaced'])
With an example :
df = pd.DataFrame({"RAM": np.random.randint(low=1, high=12, size=100).astype(str)})
df["RAM"] = pd.to_numeric(df["RAM"], errors="coerce")
df["Memory"] = pd.cut(df["RAM"], bins=[0, 4, 8, 12],
labels=["Basic", "Intermediate", "Advaced"])
Output :
RAM Memory
0 2 Basic
1 2 Basic
2 6 Intermediate
.. .. ...
97 6 Intermediate
98 1 Basic
99 7 Intermediate
[100 rows x 2 columns]