I have a pandas data frame with column names as 'ID', 'datetime', 'medication'. Now I am trying to eliminate all the morphine medication rows with respect to methadone medication given time.
For example:
ID-66531 received first methadone medication at time 2013-02-19 20:54. Since methadone and morphine, both are pain killers the effect of the medications are the same. So I am trying to eliminate morphine medication rows if methadone was given within 24hr period of morphine medication for each ID group. So for the ID-66531 if any morphine falls between the 24 hour time range of methadone given which is 2013-02-18 20:53 and 2013-02-19 20:54 in this case, I want to remove those record for each ID group
Actual dataframe:
64347 2012-02-15 19:24:00 morphine
64347 2012-02-15 22:54:00 morphine
64347 2012-02-16 05:50:00 morphine
64347 2012-02-16 12:47:00 oxyCODONE
64347 2012-10-20 05:03:00 morphine
64347 2012-10-21 15:10:00 morphine
64347 2012-10-21 19:06:00 morphine
64347 2012-11-16 16:00:00 morphine
64990 2012-09-21 07:10:00 oxyCODONE
64990 2012-09-21 16:45:00 fentaNYL
64990 2012-09-21 23:21:00 oxyCODONE
64990 2012-09-22 15:40:00 oxyCODONE
66531 2012-12-18 08:31:00 acetaminophen-HYDROcodone
66531 2012-12-18 17:44:00 acetaminophen-HYDROcodone
66531 2013-02-18 22:36:00 morphine
66531 2013-02-19 05:07:00 morphine
66531 2013-02-19 10:13:00 morphine
66531 2013-02-19 15:27:00 morphine
66531 2013-02-19 17:33:00 oxyCODONE
66531 2013-02-19 19:20:00 acetaminophen-HYDROcodone
66531 2013-02-19 20:54:00 methadone
66531 2013-02-20 06:46:00 methadone
66531 2013-02-20 13:21:00 methadone
66531 2013-02-20 22:18:00 methadone
Expected data frame:
64347 2012-02-15 19:24:00 morphine
64347 2012-02-15 22:54:00 morphine
64347 2012-02-16 05:50:00 morphine
64347 2012-02-16 12:47:00 oxyCODONE
64347 2012-10-20 05:03:00 morphine
64347 2012-10-21 15:10:00 morphine
64347 2012-10-21 19:06:00 morphine
64347 2012-11-16 16:00:00 morphine
64990 2012-09-21 07:10:00 oxyCODONE
64990 2012-09-21 16:45:00 fentaNYL
64990 2012-09-21 23:21:00 oxyCODONE
64990 2012-09-22 15:40:00 oxyCODONE
66531 2012-12-18 08:31:00 acetaminophen-HYDROcodone
66531 2012-12-18 17:44:00 acetaminophen-HYDROcodone
66531 2013-02-19 17:33:00 oxyCODONE
66531 2013-02-19 19:20:00 acetaminophen-HYDROcodone
66531 2013-02-19 20:54:00 methadone
66531 2013-02-20 06:46:00 methadone
66531 2013-02-20 13:21:00 methadone
66531 2013-02-20 22:18:00 methadone
What I tried:
I created a timediff column which gets the time difference of the between two records.
df['timediff'] = df.sort_values(by= 'datetime').groupby('ID')['datetime'].diff()
then I assigned 24 hours to a variable x
x = '24:00:00'
and tried the following code to remove morphine
df = df.groupby('ID').apply(lambda x : x.loc[~(x['Medication'] == 'morphine')<x]
but this code removes the morphine records irrespective of the methadone medication given time.
I'd really appreciate if I can get some help with this.
Here's what I came up with:
import pandas
df = pandas.read_csv('test.tsv', header=None, sep='\s+',
parse_dates = [['date', 'time']],
names=['patient_id', 'date', 'time','drug'])
to_drop = set()
for _, patient in df.groupby('patient_id'):
meth_times = patient.loc[patient['drug'] == 'methadone']['date_time']
morph_doses = patient.loc[patient['drug'] == 'morphine']
for i, md in morph_doses.iterrows():
for mt in meth_times:
days_elapsed = (mt - md['date_time']).days
if days_elapsed < 1:
to_drop.add(i)
break
df = df[~df.index.isin(to_drop)]
As with @ulmefors suggested answer, this is iterating repeatedly over each of the morphine/methadone rows, so it's somewhat inefficient, but unless you're working with a huge dataset it should get the job done quickly enough that this will be insignificant. (But unlike the other answer, it is only iterating multiple times within each patient's morphine/methadone doses rather than through all possible combinations of rows)