I am attempting to create a column based and populate it with a valued based on the comparison of one date column with three other date columns.
An example of the DataFrame df
is shown below. All dates shown are already converted to pd.to_datetime
, which resulted in a number of NaT
values as the individual has not progressed
1st_date 2nd_date 3rd_date action_date
2015-10-05 NaT NaT 2015-12-03
2015-02-27 2015-03-14 2015-03-15 2015-04-08
2015-03-07 2015-03-27 2015-03-28 2015-03-27
2015-01-05 2015-01-20 2015-01-21 2015-05-20
2015-01-05 2015-01-20 2015-01-21 2015-09-16
2015-05-23 2015-06-18 2015-06-19 2015-07-01
2015-03-03 NaT NaT 2015-07-23
2015-03-03 NaT NaT 2015-11-14
2015-06-05 2015-06-19 2015-06-20 2015-10-24
2015-10-08 2015-10-21 2015-10-22 2015-12-22
I am attempting to create a fifth column containing the result (or group) of the comparison of the action_date
column with the previous three date columns 1st_date, 2nd_date, 3rd_date
.
I am attempting to populate a fifth column called action_group
with the a string assigning each date to a group.
Pseudo-code for the potential function (and expected output) is as : if action_date > 1st_date and < 2nd_date then action_group = '1st_action_group'
The same comparison is needed for action_date
, 2nd_date
and 3rd_date
, which would result in an output of 2nd_action_group
in the action_group
column.
Finally if action_date
is greater than 3rd_date
, action_group
would be assigned a value of 3rd_action_group
.
An example of the expected output is shown below.
1st_date 2nd_date 3rd_date action_date action_group
2015-10-05 NaT NaT 2015-12-03 1st_action_group
2015-02-27 2015-03-14 2015-03-15 2015-04-08 3rd_action_group
2015-03-07 2015-03-27 2015-03-28 2015-03-27 2nd_action_group
2015-01-05 2015-01-20 2015-01-21 2015-05-20 3rd_action_group
2015-01-05 2015-01-20 2015-01-21 2015-09-16 3rd_action_group
2015-05-23 2015-06-18 2015-06-19 2015-07-01 3rd_action_group
2015-03-03 NaT NaT 2015-07-23 1st_action_group
2015-03-03 NaT NaT 2015-11-14 1st_action_group
2015-06-05 2015-06-19 2015-06-20 2015-10-24 3rd_action_group
2015-10-08 2015-10-21 2015-10-22 2015-12-22 3rd_action_group
Any assistance that anyone could provide would be greatly appreciated.
df['action_group'] = np.where(df['action_date']>df['3rd_date'],
'3rd_action_group',
np.where(((df['action_date'] >= df['2nd_date'])&(df['action_date']<df['3rd_date'])),
'2nd_action_group',
'1st_action_group'))
You can just stack 2 np.where's to get your desired results.
1st_date 2nd_date 3rd_date action_date action_group
0 2015-10-05 NaT NaT 2015-12-03 1st_action_group
1 2015-02-27 2015-03-14 2015-03-15 2015-04-08 3rd_action_group
2 2015-03-07 2015-03-27 2015-03-28 2015-03-27 2nd_action_group
3 2015-01-05 2015-01-20 2015-01-21 2015-05-20 3rd_action_group
4 2015-01-05 2015-01-20 2015-01-21 2015-09-16 3rd_action_group
5 2015-05-23 2015-06-18 2015-06-19 2015-07-01 3rd_action_group
6 2015-03-03 NaT NaT 2015-07-23 1st_action_group
7 2015-03-03 NaT NaT 2015-11-14 1st_action_group
8 2015-06-05 2015-06-19 2015-06-20 2015-10-24 3rd_action_group
9 2015-10-08 2015-10-21 2015-10-22 2015-12-22 3rd_action_group