Search code examples
pythonpandasuniquerecode

is there a way to use lambda or quicker way than a dictionary to recode pandas df column of unique categories into integer buckets like 0, 1, 2, etc?


Is there a quicker way via lambda or otherwise to recode the every unique value in a pandas df?

I am trying to recode this without a dictionary or for loop:

   df['Genres'].unique()

array(['Art & Design', 'Art & Design;Pretend Play',
       'Art & Design;Creativity', 'Art & Design;Action & Adventure', 13,
       'Auto & Vehicles', 'Beauty', 'Books & Reference', 'Business',
       'Comics', 'Comics;Creativity', 'Communication', 'Dating',
       'Education', 'Education;Creativity', 'Education;Education',
       'Education;Action & Adventure', 'Education;Pretend Play',...

It goes on for a while - a lot of unique values!

I would like to recode to 0, 1, 2, 3, etc accordingly.

TIA for any advice


Solution

  • This can be done factorize

    df['Encoding'] = pd.factorize(df['Values'])[0]
    

    Let's say I use your sample as input:

    df = pd.DataFrame({'Values':['Art & Design', 'Art & Design;Pretend Play',
           'Art & Design;Creativity', 'Art & Design;Action & Adventure', 13,
           'Auto & Vehicles', 'Beauty', 'Books & Reference', 'Business',
           'Comics', 'Comics;Creativity', 'Communication', 'Dating',
           'Education', 'Education;Creativity', 'Education;Education',
           'Education;Action & Adventure', 'Education;Pretend Play']})
    

    Using the code proposed above, I get:

                                 Values  Encoding
    0                      Art & Design         0
    1         Art & Design;Pretend Play         1
    2           Art & Design;Creativity         2
    3   Art & Design;Action & Adventure         3
    4                                13         4
    5                   Auto & Vehicles         5
    6                            Beauty         6
    7                 Books & Reference         7
    8                          Business         8
    9                            Comics         9
    10                Comics;Creativity        10
    11                    Communication        11
    12                           Dating        12
    13                        Education        13
    14             Education;Creativity        14
    15              Education;Education        15
    16     Education;Action & Adventure        16
    17           Education;Pretend Play        17