Search code examples
pythonpandasisin

Selecting multiple rows based on different column values


I'm trying to evaluate some images based on the classification. I use the piece of code below to read the csv file:

import pandas as pd
file = pd.read_csv('test.csv', header=None)

So I have something that looks like this:

Image1  2  3  4  5  Green
Image1  3  4  5  6  Red
Image2  4  5  6  7  Red
Image3  1  4  8  9  Green
Image4  5  3  0  1  Yellow
Image4  6  2  1  1  Green

So in case I want to keep the images with the value "Green" the output should look like this:

Image1  2  3  4  5  Green
Image1  3  4  5  6  Red
Image3  1  4  8  9  Green
Image4  5  3  0  1  Yellow
Image4  6  2  1  1  Green

which means that I want to keep the images with the same id in the first column when there is at least one with the element I check is in the last column.

I used the isin method but I don't know how to keep the images the rest of the rows with the images that do have at least on time the value "Green" in the last column.


Solution

  • We can use GroupBy.any here, where we check if any of the rows suffice our condition:

    df[df[5].eq("Green").groupby(df[0]).transform("any")]
    
            0  1  2  3  4       5
    0  Image1  2  3  4  5   Green
    1  Image1  3  4  5  6     Red
    3  Image3  1  4  8  9   Green
    4  Image4  5  3  0  1  Yellow
    5  Image4  6  2  1  1   Green