Search code examples
pandasdataframenumpyapplyseries

The pandas isin() function but returning the actual values, not just a boolean


I have an NumPy array of good animals, and a DataFrame of people with a list of animals they own.

good_animals = np.array(['Owl', 'Dragon', 'Shark', 'Cat', 'Unicorn', 'Penguin'])
data = {
>     'People': [1, 2, 3, 4, 5],
>     'Animals': [['Owl'], ['Owl', 'Dragon'], ['Dog', 'Human'], ['Unicorn', 'Pitbull'], []],
>     }
df = pd.DataFrame(data)

I want to add another column to my DataFrame, showing all the good animals that person owns.

The following gives me a Series showing whether or not each animal is a good animal.

df['Animals'].apply(lambda x: np.isin(x, good_animals))

But I want to see the actual good animals, not just booleans.


Solution

  • You can use intersection of sets from lists:

    df['new'] = df['Animals'].apply(lambda x: list(set(x).intersection(good_animals)))
    print (df)
       People             Animals            new
    0       1               [Owl]          [Owl]
    1       2       [Owl, Dragon]  [Dragon, Owl]
    2       3        [Dog, Human]             []
    3       4  [Unicorn, Pitbull]      [Unicorn]
    4       5                  []             []
    

    If possible duplciated values or if order is important use list comprehension:

    s = set(good_animals)
    df['new'] = df['Animals'].apply(lambda x: [y for y in x if y in s])
    print (df)
       People             Animals            new
    0       1               [Owl]          [Owl]
    1       2       [Owl, Dragon]  [Owl, Dragon]
    2       3        [Dog, Human]             []
    3       4  [Unicorn, Pitbull]      [Unicorn]
    4       5                  []             []