I am wondering, how to merge levels of a categorical variable in Python ?
I have the following dataset:
dataset['Reason'].value_counts().head(5).
Reason Count
0 339
7 125
11 124
3 82
0 65
Now, I want to merge the first and last occurrence of, so that the output looks like:
dataset['Reason'].value_counts().head(5)
Reason Count
0 404
7 125
11 124
3 82
2 52
In order to get to the reason, I have had to split a string, which might have led to the various levels in the reason
column.
I have tried to use the loc
function, but I am wondering, whether there is smarter way to do it:
dataset.loc[dataset['Reason'] == '0' , ['Reason']] = 'On request'
dataset.loc[dataset['Reason'] == '0 ' , ['Reason']] = 'On request'
Thanks, Michael.
Like @anky_91 mentioned use Series.str.strip
if all values are strings:
dataset['Reason'].str.strip().value_counts().head(5)
If some values are numeric first cast to string
s by Series.astype
:
dataset['Reason'].astype(str).str.strip().value_counts().head(5)