Search code examples
pandascategories

can't add categories to a category dtype in pandas


I have a pandas dataframe with a field called "promo_type" which I converted to categorical by using astype:

df['promo_type'] = df['promo_type'].astype('category')

Later on in the code I want to add another category to the field, as follows:

df['promo_type'].add_categories('0')

And I get this error:

AttributeError: 'Series' object has no attribute 'add_categories'

I have checked that my pandas version does have add_categories, and that add_categories is an available method for df['promo_type'].

I have no idea why this isn't working.

Thanks for the help in advance.


Solution

  • You missed the cat accessor. You have to use Series.cat.add_categories:

    df['promo_type'] = df['promo_type'].cat.add_categories('0')
    

    Setup:

    df = pd.DataFrame({'promo_type': ['a', 'b', 'c']}).astype('category')
    print(df['promo_type'])
    
    # Output
    0    a
    1    b
    2    c
    Name: promo_type, dtype: category
    Categories (3, object): ['a', 'b', 'c']
    

    Add category:

    df['promo_type'] = df['promo_type'].cat.add_categories('0')
    print(df['promo_type'])
    
    # Output
    0    a
    1    b
    2    c
    Name: promo_type, dtype: category
    Categories (4, object): ['a', 'b', 'c', '0']  # <- HERE
    

    Update

    You can use add_categories without cat accessor only if you use a CategoricalIndex:

    df = pd.DataFrame({'promo_type': ['a', 'b', 'c']})
    catx = pd.CategoricalIndex(df['promo_type'])
    print(catx)
    
    # Output
    CategoricalIndex(['a', 'b', 'c'], categories=['a', 'b', 'c'], ordered=False, dtype='category', name='promo_type')
    

    Modify category:

    catx = catx.add_categories('0')
    print(catx)
    
    # Output
    CategoricalIndex(['a', 'b', 'c'], categories=['a', 'b', 'c', '0'], ordered=False, dtype='category', name='promo_type')