Search code examples
pythonloopsdataframedata-cleaning

Cleaning data in multiple columns and concatenating after


I have a dataframe dftest which has these columns: ADDRESS1, ADDRESS2, ADDRESS3, POSTCODE. I'm trying to clean the data in each column and subsequently merge them into column FULL ADDRESS with a space between the data in each column.

This is what I'd like to do: replace(',', '').replace("'", '').replace('.', '').upper().strip() but I can't figure out how to loop through the columns. Sometimes ADDRESS2 and ADDRESS3 have NaN entries given there is no data there, but when they do have values, I'd like them to be used in the merged final string.


Solution

  • Gather the columns, using empty strings instead of NaNs:

    parts = df[['ADDRESS1', 'ADDRESS2', 'ADDRESS3', 'POSTCODE']].replace(np.nan, '')
    

    Concatenate strings:

    full = parts.apply(' '.join, axis=1)
    

    Remove unwanted characters:

    clean = full.str.replace("[,'.]", "").upper().strip()
    

    And insert back into the dataframe:

    df['FULL ADDRESS'] = clean