Search code examples
pythonpandasindexingboolean

pandas: boolean indexing using a list of boolean series


Say you have a list of pandas Series objects, each series being of boolean dtype:

boolean_series_list = [s1, s2, s3, ..., sn]

You have another series s which has the same index as all the boolean series in boolean_series_list, and you want to index it to return only values for which True appears at the corresponding index of any of the series in boolean_series_list. How do you do that?

I know the | operator can be used to combine such series:

s[s1|s2]

but how do you do this for the entire list of such series without manually rolling it out into s[s1|s2|s3|...|sn]? Something like:

cond = boolean_series_list[0]
for series in boolean_series_list[1:]:
    cond = cond | series
s[cond]

works, but it seems relatively clunky considering the typically neat high-level interface tha Pandas provides for interacting with Boolean series, like use of the boolean operator | and others in the first place, even though Series objects aren't actual booleans. With actual booleans in Python, you can just use the built-in any() function, but any(boolean_series_list) returns:

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

so is there any neat equivalent to any() for Pandas objects?

Similar question for the & operator, which for actual booleans is served by the built-in all(), etc.


Solution

  • You can first gather all the boolean series into one dataframe, then use any to aggregate the conditions into one series:

    cond = pd.concat(boolean_series_list, axis=1).any(axis=1)
    s[cond]
    

    To get the equivalent of using chained & operators you can similarly use all instead of any.