Search code examples
pythonpandasdata-manipulation

Pandas: delete rows based on the value of other row


I have a pandas data frame with column names as 'ID', 'datetime', 'medication'. Now I am trying to eliminate all the morphine medication rows with respect to methadone medication given time.

For example:

ID-66531 received first methadone medication at time 2013-02-19 20:54. Since methadone and morphine, both are pain killers the effect of the medications are the same. So I am trying to eliminate morphine medication rows if methadone was given within 24hr period of morphine medication for each ID group. So for the ID-66531 if any morphine falls between the 24 hour time range of methadone given which is 2013-02-18 20:53 and 2013-02-19 20:54 in this case, I want to remove those record for each ID group

Actual dataframe:

64347   2012-02-15 19:24:00 morphine
64347   2012-02-15 22:54:00 morphine
64347   2012-02-16 05:50:00 morphine
64347   2012-02-16 12:47:00 oxyCODONE
64347   2012-10-20 05:03:00 morphine
64347   2012-10-21 15:10:00 morphine
64347   2012-10-21 19:06:00 morphine
64347   2012-11-16 16:00:00 morphine
64990   2012-09-21 07:10:00 oxyCODONE
64990   2012-09-21 16:45:00 fentaNYL
64990   2012-09-21 23:21:00 oxyCODONE
64990   2012-09-22 15:40:00 oxyCODONE
66531   2012-12-18 08:31:00 acetaminophen-HYDROcodone
66531   2012-12-18 17:44:00 acetaminophen-HYDROcodone
66531   2013-02-18 22:36:00 morphine
66531   2013-02-19 05:07:00 morphine
66531   2013-02-19 10:13:00 morphine
66531   2013-02-19 15:27:00 morphine
66531   2013-02-19 17:33:00 oxyCODONE
66531   2013-02-19 19:20:00 acetaminophen-HYDROcodone
66531   2013-02-19 20:54:00 methadone
66531   2013-02-20 06:46:00 methadone
66531   2013-02-20 13:21:00 methadone
66531   2013-02-20 22:18:00 methadone

Expected data frame:

64347   2012-02-15 19:24:00 morphine
64347   2012-02-15 22:54:00 morphine
64347   2012-02-16 05:50:00 morphine
64347   2012-02-16 12:47:00 oxyCODONE
64347   2012-10-20 05:03:00 morphine
64347   2012-10-21 15:10:00 morphine
64347   2012-10-21 19:06:00 morphine
64347   2012-11-16 16:00:00 morphine
64990   2012-09-21 07:10:00 oxyCODONE
64990   2012-09-21 16:45:00 fentaNYL
64990   2012-09-21 23:21:00 oxyCODONE
64990   2012-09-22 15:40:00 oxyCODONE
66531   2012-12-18 08:31:00 acetaminophen-HYDROcodone
66531   2012-12-18 17:44:00 acetaminophen-HYDROcodone
66531   2013-02-19 17:33:00 oxyCODONE
66531   2013-02-19 19:20:00 acetaminophen-HYDROcodone
66531   2013-02-19 20:54:00 methadone
66531   2013-02-20 06:46:00 methadone
66531   2013-02-20 13:21:00 methadone
66531   2013-02-20 22:18:00 methadone

What I tried:

I created a timediff column which gets the time difference of the between two records.

df['timediff'] = df.sort_values(by= 'datetime').groupby('ID')['datetime'].diff()

then I assigned 24 hours to a variable x

x = '24:00:00' and tried the following code to remove morphine

df = df.groupby('ID').apply(lambda x : x.loc[~(x['Medication'] == 'morphine')<x]

but this code removes the morphine records irrespective of the methadone medication given time.

I'd really appreciate if I can get some help with this.


Solution

  • Here's what I came up with:

    import pandas
    df = pandas.read_csv('test.tsv', header=None, sep='\s+',
                         parse_dates = [['date', 'time']],
                         names=['patient_id', 'date', 'time','drug'])
    
    to_drop = set()
    for _, patient in df.groupby('patient_id'):
        meth_times = patient.loc[patient['drug'] == 'methadone']['date_time']
        morph_doses = patient.loc[patient['drug'] == 'morphine']
        for i, md in morph_doses.iterrows():
            for mt in meth_times:
                days_elapsed = (mt - md['date_time']).days
                if days_elapsed < 1:
                    to_drop.add(i)
                    break
    
    df = df[~df.index.isin(to_drop)]
    

    As with @ulmefors suggested answer, this is iterating repeatedly over each of the morphine/methadone rows, so it's somewhat inefficient, but unless you're working with a huge dataset it should get the job done quickly enough that this will be insignificant. (But unlike the other answer, it is only iterating multiple times within each patient's morphine/methadone doses rather than through all possible combinations of rows)