Search code examples
pythonpandascategories

Pandas reorder categories working with NaN


Is there a way of using df['Column'].astype('category') and df['Column'].cat.reorder_categories() to list NaN in one of the positions? .astype() doesn't appear to affect NaN values in my dataframe.

Basically for df['Column'].unique() I have:

['Moderate' 'Liberal' 'Somewhat Conservative' 'Somewhat liberal' 'Very Liberal' 'Very Conservative' 'Conservative' nan]

And I would like to get it to:

['Very Liberal' < 'Liberal' < 'Somewhat liberal' < 'Moderate' < 'Somewhat Conservative' < 'Conservative' < 'Very Conservative' < nan]

I have tried:

df['Column'] = df['Column'].astype('category')

df['Column'] = df['Column'].cat.reorder_categories(['Very Liberal', 'Liberal', 'Somewhat liberal', 'Moderate', 'Somewhat Conservative', 'Conservative', 'Very Conservative', np.nan], ordered=True)

But it throws the error "ValueError: items in new_categories are not the same as in old categories" indicating that np.nan doesn't exist in the categories.

So I guess I'm wondering how to specify/represent NaN as a category, and how to order it within categories of a column.


Solution

  • Your error comes from missing non-NA categories in your column. You need to add them with add_categories

    You should however not add NaN as category, NaN is always a possible category with code -1. Thus NaN is not directly orderable within the categories. You can however chose the NaN ordering position in sort_values and the na_position='last' parameter.

    order = ['Very Liberal', 'Liberal', 'Somewhat liberal', 'Moderate', 'Somewhat Conservative', 'Conservative', 'Very Conservative']
    
    df['Column'] = (df['Column']
                    .cat.add_categories(set(order).difference(df['Column'].cat.categories))
                    .cat.reorder_categories(order, ordered=True)
                   )
    

    Now let's sort:

    df['Column'].sort_values(na_position='last')
    

    If you really want an orderable NaN, use a placeholder string such as 'NAN' and set it as category.