Search code examples
pythonpandasdata-cleaning

putting a space between the 3rd and 4th characters of postal code in Python


I have a df (data frame) in Python with with a postal_code variable that has Canadian postal codes. All the postal codes are collected in a row such as k1b6j2. But , for me to use the the 'pgeocode' package i need the postal code in a standard form which is like k1b 6j2 (there must be a space between the 3rd and 4th character)

So, is there a way to replace all the postal codes with the modified ones in the df?

Many Thanks, Ben


Solution

  • You can simply redefine your column in the format you are expecting:

    import pandas as pd 
    df = pd.DataFrame({'ID':[1,2,3,4],'postal_code':['k1b6j2','k1b6j3','k1b6j4','k1b6j5']}) ##Example dataset
    
       ID postal_code
    0   1      k1b6j2
    1   2      k1b6j3
    2   3      k1b6j4
    3   4      k1b6j5
    

    Redefine the postal_code column in the format you desire:

    df['postal_code'] = df['postal_code'].astype(str)
    df['postal_code'] = [(x[:3]+" "+x[3:]) for x in df['postal_code']]
    

    Output:

       ID postal_code
    0   1     k1b 6j2
    1   2     k1b 6j3
    2   3     k1b 6j4
    3   4     k1b 6j5
    

    Finally compared to some other methods in a 100k length dataframe:

    df['postal_code'] = [(x[:3]+" "+x[3:]) for x in df['postal_code']] 
    First method time:  0.08260965347290039
    df['postal_code'] = df["postal_code"].str[:3] + " " + df["postal_code"].str[3:]
    Second method time:  0.112518310546875