I need to extract the ZIP code (only the zip code) into a new column for further analysis. I am mostly using pandas within my data cleaning phase. I trying to use this code before:
import pandas as pd
df_participant = pd.read_csv('https://storage.googleapis.com/dqlab-dataset/dqthon-participants.csv')
df_participant['postal_code'] = df_participant['address'].str.extract(r'([0-9]\d+)')
print (df_participant[['address','postal_code']].head())
but it did not work
Any help would be very much appreciated! Thank you
With str.extract
df_participant['postal_code'] = df_participant['address'].str.extract(r'(\d{5})')
#OR if the length of the postal code changes, just make it \d+ combined with "$"
df_participant['postal_code'] = df_participant['address'].str.extract(r'(\d+)$')
but you don't need it here. Just take the last 5 digits of the string, since the postal code is always at the end.
df_participant['postal_code'] = df_participant['address'].str[-5:]