Search code examples
pythonpandasgroup-by

groupby with conditions .nth()


I am trying to get an additional column ['readmit'] that gives the time elapsed in days since the indexed admission discharge date(first occurance) for each new episode of the same keys admit date.

df =  pd.DataFrame({'Key': ['10003', '10003', '10003', '10003', '10003','10003','10034', '10034', '10034', '10034'], 
    'admit': [20120506, 20120508, 20121014,20121016,20121020,20121110,20120506,20120517, 20120601, 20120606],  
    'discharge': [20120508, 20120510, 20121015,20121018,20121023,20121118,20120520,20120520, 20120605, 20120620]})
df['admit'] = pd.to_datetime(df['admit'], format='%Y%m%d')
df['discharge'] = pd.to_datetime(df['discharge'], format='%Y%m%d')
    Key admit           discharge
0   10003   2012-05-06  2012-05-08
1   10003   2012-05-08  2012-05-10
2   10003   2012-10-14  2012-10-15
3   10003   2012-10-16  2012-10-18
4   10003   2012-10-20  2012-10-23
5   10003   2012-11-10  2012-11-18
6   10034   2012-05-06  2012-05-20
7   10034   2012-05-21  2012-05-26
8   10034   2012-06-01  2012-06-05
9   10034   2012-06-09  2012-06-20

final df should look like this:

    Key admit           discharge       readmit
0   10003   2012-05-06  2012-05-08        0
1   10003   2012-05-08  2012-05-10        0
2   10003   2012-10-14  2012-10-15        159
3   10003   2012-10-16  2012-10-18        161
4   10003   2012-10-20  2012-10-23        165
5   10003   2012-11-10  2012-11-18        186
6   10034   2012-05-06  2012-05-20        0 
7   10034   2012-05-21  2012-05-26        1
8   10034   2012-06-01  2012-06-05        12
9   10034   2012-06-09  2012-06-20        20 

This code does not work:

df['readmit']= (df['admit'].groupby(df.Key) - df['discharge'].groupby(df.Key).nth[0]).dt.days

Solution

  • You can subtract column by Series.sub with first values per grousp by GroupBy.transform and GroupBy.first, get days by Series.dt.days and convert negative values to 0 by Series.clip:

    df['readmit'] = (df['admit'].sub(df.groupby('Key')['discharge'].transform('first'))
                                .dt.days
                                .clip(lower=0))
    print (df)
         Key      admit  discharge  readmit
    0  10003 2012-05-06 2012-05-08        0
    1  10003 2012-05-08 2012-05-10        0
    2  10003 2012-10-14 2012-10-15      159
    3  10003 2012-10-16 2012-10-18      161
    4  10003 2012-10-20 2012-10-23      165
    5  10003 2012-11-10 2012-11-18      186
    6  10034 2012-05-06 2012-05-20        0
    7  10034 2012-05-17 2012-05-20        0
    8  10034 2012-06-01 2012-06-05       12
    9  10034 2012-06-06 2012-06-20       17