Search code examples
pythonpandasdataframedata-cleaning

Fill Null address based on the same owner in Python


Lets say I have a table of house cleaning service like this.

| Customer| House Address | Date     |
| ------- | ------------- | -------- |
| Sam     | London        | 10/01/22 |
| Lina    | Manchester    | 12/01/22 |
| Sam     | Null          | 15/01/22 |

We know that Sam house address should be London (assume that the customer id is the same).

How can I fill the third row based on the first row?

Data:

{'Customer': ['Sam', 'Lina', 'Sam'],
 'House Address': ['London', 'Manchester', nan],
 'Date': ['10/01/22', '12/01/22', '15/01/22']}

Solution

  • You could groupby "Customer" and transform first for "House Address" (first drops NaN values so only London will be selected for Sam). It returns a DataFrame having the same indexes as the original df filled with the transformed firsts.

    Then pass this to fillna to fill NaN values in "House Address":

    df['House Address'] = df['House Address'].fillna(df.groupby('Customer')['House Address'].transform('first'))
    

    Output:

      Customer House Address      Date
    0      Sam        London  10/01/22
    1     Lina        Sydney  12/01/22
    2      Sam        London  15/01/22