Search code examples
pythonpandasdataframepandas-groupby

Filter out rows that are the same in one column but have multiple values in another columns respectively in dataframe


To say I have a dataframe with three columns like:

index A B C
1 foo One 1
2 foo Two 2
3 foo Three 3
4 bar One 2
5 bar One 1
6 num Two 3
7 num Three 3

In this case, how may I filter out the rows that have the same value in column B but more than one respective value in column C by using Python Pandas?

The rows that I need is 1, 2, 4, 5, 6 because "One" in column B has two corresponding values (1 and 2) in column C and "Two" in column B has two corresponding values as well. Eventually I want to group them by column A if possible.


Solution

  • You can try groupby B column then filter by the value_counts of C column.

    out = df.groupby('B').filter(lambda group: len(group['C'].value_counts()) > 1)
    
    print(out)
    
       index    A    B  C
    0      1  foo  One  1
    1      2  foo  Two  2
    3      4  bar  One  2
    4      5  bar  One  1
    5      6  num  Two  3