Search code examples
python-3.xpandasgroup-by

Extracting Specific Groups from a DataFrame under Specific Conditions


I have a DataFrame as below where I want to extract groups that contain a row with (Name: J and Age: 33)

X Y Name Age
1 3 J 33
1 3 A 47
1 4 B 53
1 4 X 22
2 3 J 33
2 3 P 80
2 4 V 90
2 4 V 93

Overall it would produce the table below, because the 1, 3 X/Y group contains J, 33 and the 2,3 X/Y group also contains a J, 33 row.

X Y Name Age
1 3 J 33
1 3 A 47
2 3 J 33
2 3 P 80

I've been approaching this by iterating over the rows which has been far too slow and was wondering if there was a much faster way by using the groupby and apply/pipe methods in Pandas. Any help is appreciated

Example DF below:

df = pd.DataFrame({
    'X': [1,1,1,1,2,2,2,2],
    'Y': [3,3,4,4,3,3,4,4],
    'Name': ['J', 'A', 'B', 'X', 'V', 'P', 'J', 'V'],
    'Age': [33,47,53,22,33,80,33,93]
})

Solution

  • One option is with a groupby:

    # Get rows equal to ('J', 33)
    check = df.loc(axis=1)[['Name','Age']].eq(('J', 33)).all(axis=1)
    # run a groupby and get groups where True exists for any row in that group
    check = check.groupby([df.X, df.Y]).transform('any')
    #filter original dataframe
    df.loc[check]
    
       X  Y Name  Age
    0  1  3    J   33
    1  1  3    A   47
    4  2  3    J   33
    5  2  3    P   80
    

    Another option, still with a groupby:

    group = df.groupby(['X','Y'])
    cond1 = group.Name.transform(lambda x: any(x == 'J'))
    cond2 = group.Age.transform(lambda x: any(x == 33))
    df.loc[cond1 & cond2]
       X  Y Name  Age
    0  1  3    J   33
    1  1  3    A   47
    4  2  3    J   33
    5  2  3    P   80