I have a dataframe dftest
which has these columns: ADDRESS1, ADDRESS2, ADDRESS3, POSTCODE
. I'm trying to clean the data in each column and subsequently merge them into column FULL ADDRESS
with a space between the data in each column.
This is what I'd like to do:
replace(',', '').replace("'", '').replace('.', '').upper().strip()
but I can't figure out how to loop through the columns. Sometimes ADDRESS2
and ADDRESS3
have NaN
entries given there is no data there, but when they do have values, I'd like them to be used in the merged final string.
Gather the columns, using empty strings instead of NaNs:
parts = df[['ADDRESS1', 'ADDRESS2', 'ADDRESS3', 'POSTCODE']].replace(np.nan, '')
Concatenate strings:
full = parts.apply(' '.join, axis=1)
Remove unwanted characters:
clean = full.str.replace("[,'.]", "").upper().strip()
And insert back into the dataframe:
df['FULL ADDRESS'] = clean