Search code examples
pythonpandasgroup-by

Filter out groups in pandas based on values in groups


Currently i have a dataframe looking something like this

Session Id Time Event Data
1 10:00 Btn click foo
1 11:00 Identification bar
2 ... Btn click foo
2 ... Btn click foo
3 .. Identification bar

I want to group my data by Session Id and process the groups further, but only if they possess an Identification Event.

Currently my solution looks like this:

for session in df.groupby('Session ID'):
    session_df: pd.DataFrame = session[1]
    if 'Identification' in session_df['Event'].values:
        process(session_df)

I tried to use filter on the groupby but got something wrong:

for session in df.groupby('Session ID').filter(lambda s: 'Identification' in s['Event'].values):
    process(session[1])

Solution

  • Code

    make codition by groupby + transform

    cond = df['Event'].eq('Identification').groupby(df['Session Id']).transform(sum).gt(0)
    out = df[cond]
    

    out :

        Session Id  Time    Event           Data
    0   1          10:00    Btn click       foo
    1   1          11:00    Identification  bar
    4   3          ..       Identification  bar
    

    if you want groupby + filter, use following code:

    df.groupby('Session Id').filter(lambda x: x['Event'].eq('Identification').sum() > 0)
    

    same result


    Example Code

    import pandas as pd
    data1 = {'Session Id': [1, 1, 2, 2, 3], 
             'Time': ['10:00', '11:00', '...', '...', '..'], 
             'Event': ['Btn click', 'Identification', 'Btn click', 'Btn click', 'Identification'], 
             'Data': ['foo', 'bar', 'foo', 'foo', 'bar']}
    df = pd.DataFrame(data1)