I have a large dataset of restaurant inspections. One inspection will trigger several code violations. I want to find out if any inspections don't contain a specific code violation (for evidence of pests). I have the data in a Pandas data frame.
I tried separating the data frame based on whether the violation for pests was included. And I tried to group by the violation code. Can't seem to figure it out.
With the pest violation as "3A", data could look like:
import pandas as pd
df = pd.DataFrame(data = {
'visit' : ['1', '1', '1', '2', '2', '3', '3'],
'violation' : ['3A', '4B', '5C', '3A', '6C', '7D', '8E']
})
visit violation
0 1 3A
1 1 4B
2 1 5C
3 2 3A
4 2 6C
5 3 7D
6 3 8E
I'd like to end up with this:
result = pd.DataFrame(data = {
'visit' : ['3', '3'], 'violation' : ['7D', '8E']
})
Out[15]:
visit violation
0 3 7D
1 3 8E
Try using:
value = '3A'
print(df.groupby('visit').filter(lambda x: all(value != i for i in x['violation'])))
Output:
violation visit
5 7D 3
6 8E 3