I'm trying to take a dataframe of patient data and create a new df that includes their name and date if they had an encounter with three services on the same date.
first I have a dataframe
import pandas as pd
df = pd.DataFrame({'name': ['Bob', 'Charlie', 'Bob', 'Sam', 'Bob', 'Sam', 'Chris'],
'date': ['06-02-2023', '01-02-2023', '06-02-2023', '20-12-2022', '06-02-2023','08-06-2015', '26-08-2020'],
'department': ['urology', 'urology', 'oncology', 'primary care', 'radiation', 'primary care', 'oncology']})
I tried group by on the name and date with an agg function to create a list
df_group = df.groupby(['name', 'date']).agg({'department': pd.Series.unique})
For bob, this created made department contain [urology, oncology, radiation].
now when I try to search for the departments in the list, to then just find the rows that contain the departments in question, I get an error.
df_group.loc[df_group['department'].str.contains('primary care')]
for instance results in KeyError: '[nan nan nan nan nan] not in index'
I assume there is a much easier way but ultimately, I want to just get a dataframe of people with the date when they have an encounter for urology, oncology, and radiation. In the above df it would result in:
Name Date Bob 06-02-2023
# define a set of departments to check for
s = {'urology', 'oncology', 'radiation'}
# groupby and aggregate to identify the combination
# of name and date that has all the required departments
out = df.groupby(['name', 'date'], as_index=False)['department'].agg(s.issubset)
# out
name date department
0 Bob 06-02-2023 True
1 Charlie 01-02-2023 False
2 Chris 26-08-2020 False
3 Sam 08-06-2015 False
4 Sam 20-12-2022 False
# out[out['department'] == True]
name date department
0 Bob 06-02-2023 True