arrays pandas indexing comparison series

Comparison and indexing series of arrays with length > 1

Title sounds more complicated than the facts really are. Given the data

data = [
    np.array(['x'], dtype='object'),
    np.array(['y'], dtype='object'),
    np.array(['z'], dtype='object'),
    np.array(['x', 'z', 'y'], dtype='object'),
    np.array(['y', 'x'], dtype='object'),
]    

s = pd.Series(data)

I would like to retrieve to elements of s where s == np.array(['x']). The obvious way

c = np.array(['x'])
s[s==c]

does not work, since there is a ValueError in the comparison, complaining that "'Lengths must match to compare', (5,), (1,)". I also tried

s[s=='x']

which only works if the elements of s have all exactly one element themselves.

Is there a way to retrieve all elements of s, where s == c, without converting the elements to string?

Solution

If we use a loop, I think this is a simpler way.

out = s[s.apply(lambda x: x.tolist() == ['x'])]

out:

0    [x]
dtype: object

checking example

import pandas as pd
import numpy as np

data1 = [
    np.array(['x'], dtype='object'),
    np.array(['y'], dtype='object'),
    np.array(['z'], dtype='object'),
    np.array(['x', 'z', 'y'], dtype='object'),
    np.array(['y', 'x'], dtype='object'),
]  * 1000000
s1 = pd.Series(data1)

5000000 rows

c = np.array(['x'], dtype='object')
d = c.tolist()

chk speed

>>> import timeit
>>> %timeit s1[s1.apply(lambda x: x.tolist() == d)]

1.38 s ± 106 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

>>> %timeit s1[[np.array_equal(a, c) for a in s1]]

22.2 s ± 754 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

>>> from functools import partial
>>> eq_c = partial(np.array_equal, c)
>>> %timeit s1[map(eq_c, s1)]


21.8 s ± 449 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)