Filter dataframe based on condition before groupby

Suppose I have a dataframe like this

Create sample dataframe:

import pandas as pd
import numpy as np

data = {
    'gender': np.random.choice(['m', 'f'], size=100),
    'vaccinated': np.random.choice([0, 1], size=100),
    'got sick': np.random.choice([0, 1], size=100)
}

df = pd.DataFrame(data)

and I want to see, by gender, what proportion of vaccinated people got sick.

I've tries something like this:

df.groupby('gender').agg(lambda group: sum(group['vaccinated']==1 & group['sick']==1)
                                          /sum(group['sick']==1))

but this doesn't work because agg works on the series level. Same applies for transform. apply doesn't work either, but I'm not as clear why or how apply functions on groupby objects.

Any ideas how to accomplish this with a single line of code?

Solution

You could first filter for the vaccinated people and then group by gender and calculate the proportion of people that got sick..

df[df.vaccinated == 1].groupby("gender").agg({"got sick":"mean"})

Output:

        got sick
gender          
f       0.548387
m       0.535714

In this case the proportion is calculated based on a sample data that I've created