Is there a way to bin a pandas column of STRINGS into custom groups of custom names. Like similar to the cuts function but for strings.
For example maybe using a list of lists to define what groups are.
grouping_lists = [['Pakistan', 'China', 'Iran'], ['Germany', 'UK', 'Poland'],
['Australia'], ['USA']]
Corresponding to names ['Asia', 'Europe', 'Australia', 'Other']
.
And should something not exist in the list, it get's marked as 'Other'
or something.
Example:
my_id country_name
0 100 Pakistan
1 200 Germany
2 140 Australia
3 400 Germany
4 225 China
5 125 Pakistan
6 600 Poland
7 0 Austria
my_id country_name Groups
0 100 Pakistan Asia
1 200 Germany Europe
2 140 Australia Australia
3 400 Germany Europe
4 225 China Asia
5 125 Pakistan Asia
6 600 Poland Europe
7 0 Austria Other
Here's a way where you can don't need to create map dict manually (in case it is big):
grouping_lists = [['Pakistan', 'China', 'Iran'], ['Germany', 'UK', 'Poland'],
['Australia'], ['USA']]
names = ['Asia', 'Europe', 'Australia', 'Other']
# create a df with mapping information
maps = (pd.DataFrame({'Groups': names, 'country_name': grouping_lists})
.explode('country_name')
.reset_index(drop=True))
# join maps
df = df.merge(maps, on = 'country_name', how='left').fillna("Other")
my_id country_name Groups
0 100 Pakistan Asia
1 200 Germany Europe
2 140 Australia Australia
3 400 Germany Europe
4 225 China Asia
5 125 Pakistan Asia
6 600 Poland Europe
7 0 Austria Other