Search code examples
pythonpandasmergelevels

Merge levels for categorical levels in pandas


I am wondering, how to merge levels of a categorical variable in Python ?

I have the following dataset:

dataset['Reason'].value_counts().head(5).
Reason  Count
0       339
7       125
11      124
3        82
0        65

Now, I want to merge the first and last occurrence of, so that the output looks like:

dataset['Reason'].value_counts().head(5)
Reason  Count
0       404
7       125
11      124
3        82
2        52

In order to get to the reason, I have had to split a string, which might have led to the various levels in the reason column.

I have tried to use the loc function, but I am wondering, whether there is smarter way to do it:

dataset.loc[dataset['Reason'] == '0' , ['Reason']] = 'On request'
dataset.loc[dataset['Reason'] == '0 ' , ['Reason']] = 'On request'

Thanks, Michael.


Solution

  • Like @anky_91 mentioned use Series.str.strip if all values are strings:

    dataset['Reason'].str.strip().value_counts().head(5)
    

    If some values are numeric first cast to strings by Series.astype:

    dataset['Reason'].astype(str).str.strip().value_counts().head(5)