Search code examples
pythonpandasdataframedata-sciencedata-preprocessing

Corresponding week column for each different participant with starting dates


I have a sample dataframe as given below.

import pandas as pd
data = {'ID':['A','A','A','A','A', 'B','B','B','C','C','C','C'],
    'Date':['9/20/2021  4:34:57 AM', '9/21/2021  6:54:22 PM', '9/28/2021  5:54:22 PM', 
           '9/30/2021  6:54:22 PM', '10/5/2021  6:54:22 PM',
           '9/4/2021  7:04:38 PM','9/9/2021  7:04:38 PM','9/17/2021  7:04:38 PM',
           '9/28/2021  3:21:23 AM','9/30/2021  3:21:23 AM','10/05/2021  3:21:23 
            AM','10/15/2021  3:21:23 AM']}
    
 df1 = pd.DataFrame(data)
 df1

The 'Date' column is in datetime format.

Now for each participant, the first entry is the enrolled date and that would be the start of week 1 for that particular participant. I want corresponding week number for the subsequent date entries, considering the first entry as Day 0.

The final dataframe should look like the image given below.

enter image description here

Any help is greatly appreciated. Thank you.


Solution

  • Try:

    df['Date'] = pd.to_datetime(df['Date'])
    
    df['Week'] = df.groupby('ID')['Date'] \
                   .apply(lambda t: 1 + (t - min(t)).dt.days // 7)
    
    df['Week'] = 'Week' + df['Week'].astype(str)
    

    Output:

    >>> df
       ID                Date   Week
    0   A 2021-09-20 04:34:57  Week1
    1   A 2021-09-21 18:54:22  Week1
    2   A 2021-09-28 17:54:22  Week2
    3   A 2021-09-30 18:54:22  Week2
    4   A 2021-10-05 18:54:22  Week3
    5   B 2021-09-04 19:04:38  Week1
    6   B 2021-09-09 19:04:38  Week1
    7   B 2021-09-17 19:04:38  Week2
    8   C 2021-09-28 03:21:23  Week1
    9   C 2021-09-30 03:21:23  Week1
    10  C 2021-10-05 03:21:23  Week2
    11  C 2021-10-15 03:21:23  Week3
    

    Fixed by @ddejohn