Search code examples
rust-polars

& operator in predicate (filter) - How to base the filter on multiple column values?


I am trying to filter a Polars dataframe based on two columns values. In the the rust user guide I could only find a predicate based filter based on one column value.

How do I add additional columns in the predicate and return a boolean?

I am trying to keep rows where job="Social worker" AND "sex"="M" (may be additional columns if need be)

    use polars::{prelude::*, lazy::dsl::col};
    // use the predicate to filter
    let predicate = (col("job").eq(lit("Social worker")) & (col("sex").eq(lit("M")))); 
   
    let filtered_df = df
        .clone()
        .lazy()
        .filter(predicate )
        .collect()
        .unwrap();

The error I get that it says the & operator cannot be used here and it does not check the two conditions.

The error message image

error[E0369]: no implementation for \Expr & Expr``
|
23 | let predicate = col("job").eq(lit("Social worker")) & col("sex").eq(lit("M"));
| ----------------------------------- ^ ----------------------- Expr
| |
| Expr

If I only use one condition the filter works as required. The following code returns the correct output.

// use the predicate to filter
    let predicate = col("job").eq(lit("Social worker"));
   
    let filtered_df = df
        .clone()
        .lazy()
        .filter(predicate )
        .collect()
        .unwrap();

Thank you for your help and reply.

I have tried using the & operator .and() and assert! none of those worked.


Solution

  • A user named Patryk27 on reddit gave me some heads up (docs) and asked me to try the expr.and(bar) syntax and it Worked!

    If your filter is based on multiple column values you can use the predicate in the following way. I wish this was part of the documentation in Polars as well.

    // use the predicate to filter
    let predicate = col("job").eq(lit("Social worker")).and(col("sex").eq(lit("F"))).and(col("name").eq(lit("Elizabeth Walsh")));
           
    let filtered_df = df
                      .clone()
                      .lazy()
                      .filter(predicate)
                      .collect()
                      .unwrap();