Python 3.6 pandas 1.1.2 on windows 10
I am trying to find a scalable way to slicing a pandas DataFrame by value selection. Example:
df = pd.DataFrame({'c1': ['a', 'c', 'c'],
'c2': ['c', 'c', 'c'],
'c3': ['b', 'c', 'c'],})
gives:
c1 c2 c3
0 a c b
1 c c c
2 c c c
now slicing the DataFrame if both column c1 and c3 contain either "a" or "b":
criteria = ['a', 'b']
new_df = df[((df.c1 == criteria[0]) | (df.c1 == criteria[1])) & \
((df.c3 == criteria[0]) | (df.c3 == criteria[1]))]
gives:
c1 c2 c3
0 a c a
The issue I am having is that if my criteria is a longer list than just those 2 elements ('a', 'b') then the process becomes harder to scale.
You can try isin
, then all
:
# more values here
vals = ['a','b', 'e', 'f', 'g', 'h']
df[df[['c1','c3']].isin(vals).all(1)]
Output:
c1 c2 c3
0 a c b