Search code examples
pythonpandasfilter

Creating reusable and composable filters for Pandas DataFrames


I am working with multiple Pandas DataFrames with a similar structure and would like to create reusable filters that I can define once and then apply or combine as needed.

The only working solution I came up with so far feels clunky to me and makes it hard to combine filters with OR:

import pandas as pd
df = pd.DataFrame({"A":[1,1,2],"B":[1,2,3]})

def filter_A(df):
    return df.loc[df["A"]==1]

def filter_B(df):
    return df.loc[df["B"]==2]

print(filter_A(filter_B(df)).head())

I am hoping for something along the lines of

filter_A = (df["A"]==1)
filter_B = (df["B"]==2)

print(df.loc[(filter_A) & (filter_B)])

but reusable after changing the df and also applicable to other DataFrames with the same columns. Is there any cleaner or more readable way to do this?


Solution

  • You can use the .eval() method, which allows for the evaluation of a string describing operations on dataframe columns:

    1. Evaluate these string expressions on the dataframe df.

    2. Combine the results of these evaluations using the bitwise AND operator (&), which performs element-wise logical AND operation.

    3. Use the .loc accessor to filter the dataframe based on the combined condition.

    filter_A = 'A == 1'
    filter_B = 'B == 2'
    df.loc[df.eval(filter_A) & df.eval(filter_B)]
    

    Output:

       A  B
    1  1  2