Search code examples
pandasrank

Create an order column in a pandas dataframe based on one date column relative to another date column


I have an extract where I'm required to identify a certain type of surgery X seen in the Surg Type column.

I need to retain medical appointments seen as distinct rows over a window/period of time, 3 appointment's prior (-3, -2, -1) and 3 appointment's post (+1, +2, +3)

I must include this order as an additional column.

On top of this I need to exclude any appointments outside of the window and any other type of Surg Type, in this example any surgery's denoted Z.

In this example 7/9 rows/record's I want retained and an additional column Prior Post

*** UPDATED EXAMPLE ***

Original Df

| Patient ID | Surg ID | Surg Type | Surg Date  | Medical Appt Date |
|------------|---------|-----------|------------|-------------------|
| 1          | 1       | X         | 2022-09-03 | 2022-01-01        |
| 1          | 1       | X         | 2022-09-03 | 2022-03-04        |
| 1          | 1       | X         | 2022-09-03 | 2022-05-04        |
| 1          | 1       | X         | 2022-09-03 | 2022-06-04        |
| 1          | 1       | X         | 2022-09-03 | 2022-11-04        |
| 1          | 1       | X         | 2022-09-03 | 2022-11-29        |
| 1          | 2       | Z         | 2022-12-01 | 2022-11-01        |
| 1          | 1       | X         | 2022-09-03 | 2023-01-02        |
| 1          | 1       | X         | 2022-09-03 | 2023-01-13        |



Desired Df

| Patient ID | Surg ID | Surg Type | Surg Date  | Medical Appt Date | Inclusion |
|------------|---------|-----------|------------|-------------------|-----------|
| 1          | 1       | X         | 2022-09-03 | 2022-01-01        | Exclude   |
| 1          | 1       | X         | 2022-09-03 | 2022-03-04        | -3        |
| 1          | 1       | X         | 2022-09-03 | 2022-05-04        | -2        |
| 1          | 1       | X         | 2022-09-03 | 2022-06-04        | -1        |
| 1          | 1       | X         | 2022-09-03 | 2022-11-04        | 1         |
| 1          | 1       | X         | 2022-09-03 | 2022-11-29        | 2         |
| 1          | 2       | Z         | 2022-12-01 | 2022-11-01        | -1        |
| 1          | 1       | X         | 2022-09-03 | 2023-01-02        | 3         |
| 1          | 1       | X         | 2022-09-03 | 2023-01-13        | Exclude   |

Solution

  • You can filter the surgeries X, then compute the time difference between the appointment and the surgery and compute the rank for negative and positive numbers per group with groupby.apply. Finally, mask the ranks outside of threshold.

    # number of medical appointments to keep before/after a surgery
    N = 3
    
    # columns to use a grouper
    group_cols = ['Patient ID', 'Surg ID']
    
    # ensure datetime
    df[['Surg Date', 'Medical Appt Date']] = df[['Surg Date', 'Medical Appt Date']].apply(pd.to_datetime)
    
    # custom function to compute the rank of the Appt Date
    def flag_nearest(g, N):
        diff = g['Medical Appt Date'].sub(g['Surg Date'])
        m = diff>'0'
        ranks = (diff.where(m).rank()
                 .fillna(-diff.mask(m).rank(ascending=False))
                 # optional, to have integers in the output
                 .astype(int).astype(object)
                )
        return ranks.where(ranks.abs()<=N, 'Excluded').to_frame()
    
    # identify non-X rows
    m = df['Surg Type'].eq('X')
    
    # compute the rank and flag columns outside of threshold
    df['Inclusion'] = (df[m].groupby(group_cols)
                            .apply(flag_nearest, N)
                            .droplevel(group_cols)
                            # optional, to fill the non-X
                            .reindex(df.index, fill_value='non-X')
                      )
    

    Output:

       Patient ID  Surg ID Surg Type  Surg Date Medical Appt Date Inclusion
    0           1        1         X 2022-09-03        2022-01-01  Excluded
    1           1        1         X 2022-09-03        2022-03-04        -3
    2           1        1         X 2022-09-03        2022-05-04        -2
    3           1        1         X 2022-09-03        2022-06-04        -1
    4           1        1         X 2022-09-03        2022-11-04         1
    5           1        1         X 2022-09-03        2022-11-29         2
    6           1        2         Z 2022-12-01        2022-11-01     non-X
    7           1        1         X 2022-09-03        2023-01-02         3
    8           1        1         X 2022-09-03        2023-01-13  Excluded
    

    Extended example:

        Patient ID  Surg ID Surg Type  Surg Date Medical Appt Date Inclusion
    0            1        1         X 2022-09-03        2022-01-01  Excluded
    1            1        1         X 2022-09-03        2022-03-04        -3
    2            1        1         X 2022-09-03        2022-05-04        -2
    3            1        1         X 2022-09-03        2022-06-04        -1
    4            1        1         X 2022-09-03        2022-11-04         1
    5            1        1         X 2022-09-03        2022-11-29         2
    6            1        2         Z 2022-12-01        2022-11-01     non-X
    7            1        1         X 2022-09-03        2023-01-02         3
    8            1        1         X 2022-09-03        2023-01-13  Excluded
    9            2        1         X 2022-09-05        2022-01-01  Excluded
    10           2        1         X 2022-09-05        2022-01-02  Excluded
    11           2        1         X 2022-09-05        2022-03-04        -3
    12           2        1         X 2022-09-05        2022-05-04        -2
    13           2        1         X 2022-09-05        2022-06-04        -1
    14           2        1         X 2022-09-05        2022-11-04         1
    15           2        1         X 2022-09-05        2022-11-29         2
    16           2        2         Z 2022-12-02        2022-11-01     non-X
    17           2        1         X 2022-09-05        2023-01-02         3
    18           2        1         X 2022-09-05        2023-01-13  Excluded