Search code examples
pythonpandaspython-datetime

Create groups based on multiple DateTime comparison


I am attempting to create a column based and populate it with a valued based on the comparison of one date column with three other date columns.

An example of the DataFrame df is shown below. All dates shown are already converted to pd.to_datetime, which resulted in a number of NaT values as the individual has not progressed

    1st_date     2nd_date        3rd_date     action_date
    2015-10-05   NaT             NaT          2015-12-03 
    2015-02-27   2015-03-14      2015-03-15   2015-04-08 
    2015-03-07   2015-03-27      2015-03-28   2015-03-27 
    2015-01-05   2015-01-20      2015-01-21   2015-05-20 
    2015-01-05   2015-01-20      2015-01-21   2015-09-16 
    2015-05-23   2015-06-18      2015-06-19   2015-07-01 
    2015-03-03   NaT             NaT          2015-07-23 
    2015-03-03   NaT             NaT          2015-11-14 
    2015-06-05   2015-06-19      2015-06-20   2015-10-24 
    2015-10-08   2015-10-21      2015-10-22   2015-12-22 

I am attempting to create a fifth column containing the result (or group) of the comparison of the action_date column with the previous three date columns 1st_date, 2nd_date, 3rd_date.

I am attempting to populate a fifth column called action_group with the a string assigning each date to a group.

Pseudo-code for the potential function (and expected output) is as : if action_date > 1st_date and < 2nd_date then action_group = '1st_action_group'

The same comparison is needed for action_date, 2nd_date and 3rd_date, which would result in an output of 2nd_action_group in the action_group column.

Finally if action_date is greater than 3rd_date, action_group would be assigned a value of 3rd_action_group.

An example of the expected output is shown below.

1st_date     2nd_date        3rd_date     action_date  action_group
2015-10-05   NaT             NaT          2015-12-03   1st_action_group
2015-02-27   2015-03-14      2015-03-15   2015-04-08   3rd_action_group
2015-03-07   2015-03-27      2015-03-28   2015-03-27   2nd_action_group
2015-01-05   2015-01-20      2015-01-21   2015-05-20   3rd_action_group
2015-01-05   2015-01-20      2015-01-21   2015-09-16   3rd_action_group
2015-05-23   2015-06-18      2015-06-19   2015-07-01   3rd_action_group
2015-03-03   NaT             NaT          2015-07-23   1st_action_group
2015-03-03   NaT             NaT          2015-11-14   1st_action_group
2015-06-05   2015-06-19      2015-06-20   2015-10-24   3rd_action_group
2015-10-08   2015-10-21      2015-10-22   2015-12-22   3rd_action_group

Any assistance that anyone could provide would be greatly appreciated.


Solution

  • df['action_group'] = np.where(df['action_date']>df['3rd_date'], 
                                  '3rd_action_group', 
                                   np.where(((df['action_date'] >= df['2nd_date'])&(df['action_date']<df['3rd_date'])), 
                                              '2nd_action_group', 
                                              '1st_action_group'))
    

    You can just stack 2 np.where's to get your desired results.

        1st_date    2nd_date    3rd_date    action_date action_group
    0   2015-10-05     NaT          NaT     2015-12-03  1st_action_group
    1   2015-02-27  2015-03-14  2015-03-15  2015-04-08  3rd_action_group
    2   2015-03-07  2015-03-27  2015-03-28  2015-03-27  2nd_action_group
    3   2015-01-05  2015-01-20  2015-01-21  2015-05-20  3rd_action_group
    4   2015-01-05  2015-01-20  2015-01-21  2015-09-16  3rd_action_group
    5   2015-05-23  2015-06-18  2015-06-19  2015-07-01  3rd_action_group
    6   2015-03-03     NaT          NaT     2015-07-23  1st_action_group
    7   2015-03-03     NaT          NaT     2015-11-14  1st_action_group
    8   2015-06-05  2015-06-19  2015-06-20  2015-10-24  3rd_action_group
    9   2015-10-08  2015-10-21  2015-10-22  2015-12-22  3rd_action_group