I have a dataframe with 3 date columns in datetime format:
CLIENT_ID | DATE_BEGIN | DATE_START | DATE_REGISTERED |
---|---|---|---|
1 | 2020-01-01 | 2020-01-01 | 2020-01-01 |
2 | 2020-01-02 | 2020-02-01 | 2020-01-01 |
3 | NaN | 2020-05-01 | 2020-04-01 |
4 | 2020-01-01 | 2020-01-01 | NaN |
How do I create (coalesce) a new column with the earliest datetime for each row resulting in an ACTUAL_START_DATE
CLIENT_ID | DATE_BEGIN | DATE_START | DATE_REGISTERED | ACTUAL_START_DATE |
---|---|---|---|---|
1 | 2020-01-01 | 2020-01-01 | 2020-01-01 | 2020-01-01 |
2 | 2020-01-02 | 2020-02-01 | 2020-01-01 | 2020-01-01 |
3 | NaN | 2020-05-01 | 2020-04-01 | 2020-04-01 |
4 | 2020-01-01 | 2020-01-02 | NaN | 2020-01-01 |
some sort of variation with bfill
?
You are right, a mix of bfill
and ffill
on the axis columns should do it:
df.assign(ACTUAL_START_DATE = df.filter(like='DATE')
.bfill(axis=1)
.ffill(axis=1)
.min(axis=1)
)
CLIENT_ID DATE_BEGIN DATE_START DATE_REGISTERED ACTUAL_START_DATE
0 1 2020-01-01 2020-01-01 2020-01-01 2020-01-01
1 2 2020-01-02 2020-02-01 2020-01-01 2020-01-01
2 3 NaN 2020-05-01 2020-04-01 2020-04-01
3 4 2020-01-01 2020-01-01 NaN 2020-01-01