I am working with multiple Pandas DataFrames with a similar structure and would like to create reusable filters that I can define once and then apply or combine as needed.
The only working solution I came up with so far feels clunky to me and makes it hard to combine filters with OR:
import pandas as pd
df = pd.DataFrame({"A":[1,1,2],"B":[1,2,3]})
def filter_A(df):
return df.loc[df["A"]==1]
def filter_B(df):
return df.loc[df["B"]==2]
print(filter_A(filter_B(df)).head())
I am hoping for something along the lines of
filter_A = (df["A"]==1)
filter_B = (df["B"]==2)
print(df.loc[(filter_A) & (filter_B)])
but reusable after changing the df and also applicable to other DataFrames with the same columns. Is there any cleaner or more readable way to do this?
You can use the .eval()
method, which allows for the evaluation of a string describing operations on dataframe columns:
Evaluate these string expressions on the dataframe df
.
Combine the results of these evaluations using the bitwise AND operator (&
), which performs element-wise logical AND operation.
Use the .loc
accessor to filter the dataframe based on the combined condition.
filter_A = 'A == 1'
filter_B = 'B == 2'
df.loc[df.eval(filter_A) & df.eval(filter_B)]
Output:
A B
1 1 2