Search code examples
python-3.xpandasany

pandas slicing by value


Python 3.6 pandas 1.1.2 on windows 10

I am trying to find a scalable way to slicing a pandas DataFrame by value selection. Example:

df = pd.DataFrame({'c1': ['a', 'c', 'c'], 
                   'c2': ['c', 'c', 'c'], 
                   'c3': ['b', 'c', 'c'],})

gives:

     c1 c2 c3
   0  a  c  b
   1  c  c  c
   2  c  c  c

now slicing the DataFrame if both column c1 and c3 contain either "a" or "b":

criteria = ['a', 'b']
new_df = df[((df.c1 == criteria[0]) | (df.c1 == criteria[1])) & \
            ((df.c3 == criteria[0]) | (df.c3 == criteria[1]))]

gives:

   c1 c2 c3
0  a  c  a

The issue I am having is that if my criteria is a longer list than just those 2 elements ('a', 'b') then the process becomes harder to scale.


Solution

  • You can try isin, then all:

    # more values here
    vals = ['a','b', 'e', 'f', 'g', 'h']
    
    df[df[['c1','c3']].isin(vals).all(1)]
    

    Output:

      c1 c2 c3
    0  a  c  b