Search code examples
pythonpandasbinning

How to bin string values according to list of strings?


Is there a way to bin a pandas column of STRINGS into custom groups of custom names. Like similar to the cuts function but for strings.

For example maybe using a list of lists to define what groups are.

grouping_lists = [['Pakistan', 'China', 'Iran'], ['Germany', 'UK', 'Poland'], 
                  ['Australia'], ['USA']] 

Corresponding to names ['Asia', 'Europe', 'Australia', 'Other'].

And should something not exist in the list, it get's marked as 'Other' or something.

Example:

          my_id  country_name
    0     100     Pakistan
    1     200     Germany
    2     140     Australia
    3     400     Germany
    4     225     China
    5     125     Pakistan
    6     600     Poland
    7       0     Austria


          my_id  country_name  Groups
    0     100     Pakistan      Asia
    1     200     Germany       Europe
    2     140     Australia     Australia
    3     400     Germany       Europe
    4     225     China         Asia
    5     125     Pakistan      Asia 
    6     600     Poland        Europe
    7       0     Austria       Other

Solution

  • Here's a way where you can don't need to create map dict manually (in case it is big):

    grouping_lists = [['Pakistan', 'China', 'Iran'], ['Germany', 'UK', 'Poland'], 
                      ['Australia'], ['USA']] 
    names =  ['Asia', 'Europe', 'Australia', 'Other']
    
    # create a df with mapping information
    maps = (pd.DataFrame({'Groups': names, 'country_name': grouping_lists})
            .explode('country_name')
            .reset_index(drop=True))
    
    # join maps
    df = df.merge(maps, on = 'country_name', how='left').fillna("Other")
    
       my_id country_name     Groups
    0    100     Pakistan       Asia
    1    200      Germany     Europe
    2    140    Australia  Australia
    3    400      Germany     Europe
    4    225        China       Asia
    5    125     Pakistan       Asia
    6    600       Poland     Europe
    7      0      Austria      Other