I have an NumPy array of good animals, and a DataFrame of people with a list of animals they own.
good_animals = np.array(['Owl', 'Dragon', 'Shark', 'Cat', 'Unicorn', 'Penguin'])
data = {
> 'People': [1, 2, 3, 4, 5],
> 'Animals': [['Owl'], ['Owl', 'Dragon'], ['Dog', 'Human'], ['Unicorn', 'Pitbull'], []],
> }
df = pd.DataFrame(data)
I want to add another column to my DataFrame, showing all the good animals that person owns.
The following gives me a Series showing whether or not each animal is a good animal.
df['Animals'].apply(lambda x: np.isin(x, good_animals))
But I want to see the actual good animals, not just booleans.
You can use intersection
of sets from lists:
df['new'] = df['Animals'].apply(lambda x: list(set(x).intersection(good_animals)))
print (df)
People Animals new
0 1 [Owl] [Owl]
1 2 [Owl, Dragon] [Dragon, Owl]
2 3 [Dog, Human] []
3 4 [Unicorn, Pitbull] [Unicorn]
4 5 [] []
If possible duplciated values or if order is important use list comprehension:
s = set(good_animals)
df['new'] = df['Animals'].apply(lambda x: [y for y in x if y in s])
print (df)
People Animals new
0 1 [Owl] [Owl]
1 2 [Owl, Dragon] [Owl, Dragon]
2 3 [Dog, Human] []
3 4 [Unicorn, Pitbull] [Unicorn]
4 5 [] []