Search code examples
pythonpandasdistinct-values

How to find missing values in groups


I have a large dataset of restaurant inspections. One inspection will trigger several code violations. I want to find out if any inspections don't contain a specific code violation (for evidence of pests). I have the data in a Pandas data frame.

I tried separating the data frame based on whether the violation for pests was included. And I tried to group by the violation code. Can't seem to figure it out.

With the pest violation as "3A", data could look like:

import pandas as pd

df = pd.DataFrame(data = {
    'visit' : ['1', '1', '1', '2', '2', '3', '3'],
    'violation' : ['3A', '4B', '5C', '3A', '6C', '7D', '8E']
    })
  visit violation
0     1        3A
1     1        4B
2     1        5C
3     2        3A
4     2        6C
5     3        7D
6     3        8E

I'd like to end up with this:

result = pd.DataFrame(data = {
    'visit' : ['3', '3'], 'violation' : ['7D', '8E']
    })
Out[15]: 
  visit violation
0     3        7D
1     3        8E

Solution

  • Try using:

    value = '3A'
    print(df.groupby('visit').filter(lambda x: all(value != i for i in x['violation'])))
    

    Output:

      violation visit
    5        7D     3
    6        8E     3