Search code examples
pythonpandasdata-cleaning

How to extract zip code from address using pandas function extract()?


I need to extract the ZIP code (only the zip code) into a new column for further analysis. I am mostly using pandas within my data cleaning phase. I trying to use this code before:

import pandas as pd
df_participant = pd.read_csv('https://storage.googleapis.com/dqlab-dataset/dqthon-participants.csv')

df_participant['postal_code'] = df_participant['address'].str.extract(r'([0-9]\d+)')

print (df_participant[['address','postal_code']].head())

but it did not work

this is the output: enter image description here

Any help would be very much appreciated! Thank you


Solution

  • With str.extract

    df_participant['postal_code'] = df_participant['address'].str.extract(r'(\d{5})')
    
    #OR if the length of the postal code changes, just make it \d+ combined with "$"
    
    df_participant['postal_code'] = df_participant['address'].str.extract(r'(\d+)$')
    
    

    but you don't need it here. Just take the last 5 digits of the string, since the postal code is always at the end.

    df_participant['postal_code'] = df_participant['address'].str[-5:]