Search code examples
pythonpandasdictionarysubstringpartial

Mapping Substrings from dataframe to return values as a new column


If I have a postal code column, I want to be able to associate substrings of each row to certain Regions. I thought about using a dictionary

dict = { 'SW1': 'London','NE':'London','W1A':'Other','CT':'Other'}

Postal Code  
SW1E 5Z
NE99 1AR
SW1
W1A 1ER
CT21 4JF

Desired table:

Postal Code   Region
SW1E 5Z       London
NE99 1AR      London
SW1           London
W1A 1ER       Other
CT21 4JF      Other

However, I don't know how to parse through substrings of a column to create the region column using python (pandas). Please advise on syntax


Solution

  • Use series.str.extract based on the dictionary keys and map them back to create a new column.

    df['Region']=(df['Postal Code'].str.extract('('+'|'.join(mydict.keys())+')',expand=False)
                                                                               .map(mydict))
    print(df)
    

      Postal Code  Region
    0     SW1E 5Z  London
    1    NE99 1AR  London
    2         SW1  London
    3     W1A 1ER   Other
    4    CT21 4JF   Other
    

    Note I have renamed dict to mydict since dict is a builtin variable and will override the behaviour of a dictionary if stored as a variable.