Search code examples
pandasdataframefunctiondatetimepython-datetime

How to Coalesce datetime values from 3 columns into a single column in a pandas dataframe?


I have a dataframe with 3 date columns in datetime format:

CLIENT_ID DATE_BEGIN DATE_START DATE_REGISTERED
1 2020-01-01 2020-01-01 2020-01-01
2 2020-01-02 2020-02-01 2020-01-01
3 NaN 2020-05-01 2020-04-01
4 2020-01-01 2020-01-01 NaN

How do I create (coalesce) a new column with the earliest datetime for each row resulting in an ACTUAL_START_DATE

CLIENT_ID DATE_BEGIN DATE_START DATE_REGISTERED ACTUAL_START_DATE
1 2020-01-01 2020-01-01 2020-01-01 2020-01-01
2 2020-01-02 2020-02-01 2020-01-01 2020-01-01
3 NaN 2020-05-01 2020-04-01 2020-04-01
4 2020-01-01 2020-01-02 NaN 2020-01-01

some sort of variation with bfill?


Solution

  • You are right, a mix of bfill and ffill on the axis columns should do it:

    df.assign(ACTUAL_START_DATE = df.filter(like='DATE')
                                    .bfill(axis=1)
                                    .ffill(axis=1)
                                    .min(axis=1)
            )
     
       CLIENT_ID  DATE_BEGIN  DATE_START DATE_REGISTERED ACTUAL_START_DATE
    0          1  2020-01-01  2020-01-01      2020-01-01        2020-01-01
    1          2  2020-01-02  2020-02-01      2020-01-01        2020-01-01
    2          3         NaN  2020-05-01      2020-04-01        2020-04-01
    3          4  2020-01-01  2020-01-01             NaN        2020-01-01