Search code examples
pythonpandasboolean-operations

Elegant method for Boolean Indexing with variable/dynamic number of values in list


Sorry for the title, I couldn't come up with one that described this issue succinctly & accurately.

Say you have dataframe such as:

                 Time           Temp      RH     Sensor  Unit  
0        2015-12-07 00:06:00  14.912000  42.324      A     1      
1        2015-12-07 00:12:00  14.768000  42.371      A     2      
2        2015-12-07 00:18:00  14.601000  42.415      A     1
3        2015-12-07 00:24:00  14.457000  42.462      A     4
...

And you want to subset these data by the Unit column. If you have the Unit you want to use to create the subset you could do:

 subset = df[df['Unit'] == 4]

...and if you wanted to subset with multiple Unit values you could do:

subset = df[(df['Unit'] == 4) | (df['Unit'] == 1)]

The problem I have is that I am using a for loop to do these operations and the number of Units included changes (length of value list varies from 1-3). In other words, imagine Unit is a list of lists that I am looping through:

for i in Unit:
    subset = df[(df['Unit'] == i]
    ...

Of course, the above will work when i is a singe value, but not when it is a list of multiple values. Is there a way to do this without an if statement?


Solution

  • If I understand correctly, you're trying to use boolean indexing against a list of conditions? For example, see the below Dataframe:

    df
           a
    0     12
    1  65346
    2   1243
    3     63
    4    568
    5    243
    

    and you'd like to index on this list of conditions:

    conditions = [12, 568]
    

    You can use a Series method isin()

    df[df['a'].isin(conditions)]
    
         a
    0   12
    4  568