Search code examples
pythonpandasgroup-by

Search for multiple encounters across rows in pandas


I'm trying to take a dataframe of patient data and create a new df that includes their name and date if they had an encounter with three services on the same date.

first I have a dataframe

import pandas as pd

df = pd.DataFrame({'name': ['Bob', 'Charlie', 'Bob', 'Sam', 'Bob', 'Sam', 'Chris'],
                   'date': ['06-02-2023', '01-02-2023', '06-02-2023', '20-12-2022', '06-02-2023','08-06-2015', '26-08-2020'],
                   'department': ['urology', 'urology', 'oncology', 'primary care', 'radiation', 'primary care', 'oncology']})

I tried group by on the name and date with an agg function to create a list

df_group = df.groupby(['name', 'date']).agg({'department': pd.Series.unique})

For bob, this created made department contain [urology, oncology, radiation].

now when I try to search for the departments in the list, to then just find the rows that contain the departments in question, I get an error.

df_group.loc[df_group['department'].str.contains('primary care')]

for instance results in KeyError: '[nan nan nan nan nan] not in index'

I assume there is a much easier way but ultimately, I want to just get a dataframe of people with the date when they have an encounter for urology, oncology, and radiation. In the above df it would result in:

Name Date Bob 06-02-2023


Solution

  • Easy solution

    # define a set of departments to check for
    s = {'urology', 'oncology', 'radiation'}
    
    # groupby and aggregate to identify the combination
    # of name and date that has all the required departments
    out = df.groupby(['name', 'date'], as_index=False)['department'].agg(s.issubset)
    

    Result

    # out
          name        date  department
    0      Bob  06-02-2023        True
    1  Charlie  01-02-2023       False
    2    Chris  26-08-2020       False
    3      Sam  08-06-2015       False
    4      Sam  20-12-2022       False
    
    # out[out['department'] == True]
      name        date  department
    0  Bob  06-02-2023        True