I have a df (data frame) in Python with with a postal_code variable that has Canadian postal codes. All the postal codes are collected in a row such as k1b6j2. But , for me to use the the 'pgeocode' package i need the postal code in a standard form which is like k1b 6j2 (there must be a space between the 3rd and 4th character)
So, is there a way to replace all the postal codes with the modified ones in the df?
Many Thanks, Ben
You can simply redefine your column in the format you are expecting:
import pandas as pd
df = pd.DataFrame({'ID':[1,2,3,4],'postal_code':['k1b6j2','k1b6j3','k1b6j4','k1b6j5']}) ##Example dataset
ID postal_code
0 1 k1b6j2
1 2 k1b6j3
2 3 k1b6j4
3 4 k1b6j5
Redefine the postal_code
column in the format you desire:
df['postal_code'] = df['postal_code'].astype(str)
df['postal_code'] = [(x[:3]+" "+x[3:]) for x in df['postal_code']]
Output:
ID postal_code
0 1 k1b 6j2
1 2 k1b 6j3
2 3 k1b 6j4
3 4 k1b 6j5
Finally compared to some other methods in a 100k length dataframe:
df['postal_code'] = [(x[:3]+" "+x[3:]) for x in df['postal_code']]
First method time: 0.08260965347290039
df['postal_code'] = df["postal_code"].str[:3] + " " + df["postal_code"].str[3:]
Second method time: 0.112518310546875