If I have a postal code column, I want to be able to associate substrings of each row to certain Regions. I thought about using a dictionary
dict = { 'SW1': 'London','NE':'London','W1A':'Other','CT':'Other'}
Postal Code
SW1E 5Z
NE99 1AR
SW1
W1A 1ER
CT21 4JF
Desired table:
Postal Code Region
SW1E 5Z London
NE99 1AR London
SW1 London
W1A 1ER Other
CT21 4JF Other
However, I don't know how to parse through substrings of a column to create the region column using python (pandas). Please advise on syntax
Use series.str.extract
based on the dictionary keys and map them back to create a new column.
df['Region']=(df['Postal Code'].str.extract('('+'|'.join(mydict.keys())+')',expand=False)
.map(mydict))
print(df)
Postal Code Region
0 SW1E 5Z London
1 NE99 1AR London
2 SW1 London
3 W1A 1ER Other
4 CT21 4JF Other
Note I have renamed dict
to mydict
since dict
is a builtin variable and will override the behaviour of a dictionary if stored as a variable.