Search code examples
pythondatasetcategorical-data

Python creating categorical variable error


I need to create categorical variables for RAM category.

Basic: RAM [0-4]

Intermediate: RAM [5-8]

Advanced: RAM [8-12]

Command:

df['Memory']=pd.cut(df['RAM '], [0,4,8,12], include_lowest=True, labels=['Basic','Intermediate', 'Advaced'])

Error:

TypeError                                 Traceback (most recent call last)
<ipython-input-58-5c93d7c00ba2> in <cell line: 1>()
----> 1 df['Memory']=pd.cut(df['RAM '], [0,4,8,12], include_lowest=True, labels=['Basic', 'Intermediate', 'Advaced'])

1 frames
/usr/local/lib/python3.9/dist-packages/pandas/core/reshape/tile.py in _bins_to_cuts(x, bins, right, labels, precision, include_lowest, dtype, duplicates, ordered)
    425 
    426     side: Literal["left", "right"] = "left" if right else "right"
--> 427     ids = ensure_platform_int(bins.searchsorted(x, side=side))
    428 
    429     if include_lowest:

TypeError: '<' not supported between instances of 'int' and 'str'

Could you please help me to fix this? I'm new to Python.


Solution

  • It seems like you have numerical-like values in your column RAM, so use to_numeric :

    df['Memory']= pd.cut(pd.to_numeric(df['RAM '], errors="coerce"), bins=[0,4,8,12],
                         include_lowest=True, labels=['Basic', 'Intermediate', 'Advaced'])
    

    With an example :

    df = pd.DataFrame({"RAM": np.random.randint(low=1, high=12, size=100).astype(str)})
    
    df["RAM"] = ​pd.to_numeric(df["RAM"], errors="coerce")
    ​
    df["Memory"] = pd.cut(df["RAM"], bins=[0, 4, 8, 12],
                          labels=["Basic", "Intermediate", "Advaced"])
    

    ​ Output :

       RAM        Memory
    0    2         Basic
    1    2         Basic
    2    6  Intermediate
    ..  ..           ...
    97   6  Intermediate
    98   1         Basic
    99   7  Intermediate
    
    [100 rows x 2 columns]