Search code examples
rustrust-polars

Rust-polars: how can I pass the "other" parameter for the function "is_in" by reference?


I am very new to Rust so please excuse me if this is a trivial question.

I am trying to filter a dataframe as follows:

    let allowed = Series::from_iter(vec![
        "string1".to_string(),
        "string2".to_string(),
    ]);
    let df = LazyCsvReader::new(&fullpath)
        .has_header(true)
        .finish().unwrap()
        .filter(col("string_id").is_in(&allowed)).collect().unwrap(); 

It looks good to me since the signature of the is_in method looks like this:

fn is_in(
    &self,
    _other: &Series
) -> Result<ChunkedArray<BooleanType>, PolarsError>

from [https://docs.rs/polars/latest/polars/series/trait.SeriesTrait.html#method.is_in]

However, when I compile it I get the following error:

error[E0277]: the trait bound `Expr: From<&polars::prelude::Series>` is not satisfied
    --> src/main.rs:33:40
     |
33   |         .filter(col("string_id").is_in(&allowed)).collect().unwrap();
     |                                  ----- ^^^^^^^^ the trait `From<&polars::prelude::Series>` is not implemented for `Expr`
     |                                  |
     |                                  required by a bound introduced by this call
     |
     = help: the following other types implement trait `From<T>`:
               <Expr as From<&str>>
               <Expr as From<AggExpr>>
               <Expr as From<bool>>
               <Expr as From<f32>>
               <Expr as From<f64>>
               <Expr as From<i32>>
               <Expr as From<i64>>
               <Expr as From<u32>>
               <Expr as From<u64>>
     = note: required for `&polars::prelude::Series` to implement `Into<Expr>`
note: required by a bound in `polars_plan::dsl::<impl Expr>::is_in`
    --> /home/myself/.cargo/registry/src/
     |
1393 |     pub fn is_in<E: Into<Expr>>(self, other: E) -> Self {
     |                     ^^^^^^^^^^ required by this bound in `polars_plan::dsl::<impl Expr>::is_in`

For more information about this error, try `rustc --explain E0277`.

To me this error looks very cryptic. I read the result of rustc --explain E0277 that says "You tried to use a type which doesn't implement some trait in a place which expected that trait", but this doesn't help in the slightest to identify which type doesn't implement which trait.

  • How do I fix this? Why doesn't it work?

NOTE: I know that writing lit(allowed) instead of &allowed works, but this is not possible because it prevents using allowed anywhere else. For example, I would like to do the following, but the following code gets (obviously) an error "use of moved value":

    let df = LazyCsvReader::new(&fullpath)
        .has_header(true)
        .finish().unwrap()
        .with_column(
            when(
                col("firstcolumn").is_in(lit(allowed))
                    .and(
                    col("secondcolumn").is_in(lit(allowed))
                    )
                )
                .then(lit("very good"))
                .otherwise(lit("very bad"))
                .alias("good_bad")
        )
        .collect().unwrap();

Bonus questions:

  • Why does it work with lit(allowed)? Shouldn't I pass the variable by reference as specified in the documentation?
  • How can I repeatedly use a Series for is_in like in the example above without having an error?

EDIT: I found a different signature for is_in requiring the second parameter to be a Expr, this would justify the need to use lit. However, it's still not clear how to use the same Series multiple times without getting the borrowed value error..


Solution

  • The signature is for Series.is_in() but you're using Expr.is_in() which differs.

    You can use cols() to select multiple columns:

    .with_columns([
        cols(["firstcolumn", "secondcolumn"]).is_in(lit(allowed))
    ])
    
    ┌─────────────┬──────────────┬─────────────┐
    │ firstcolumn ┆ secondcolumn ┆ thirdcolumn │
    │ ---         ┆ ---          ┆ ---         │
    │ bool        ┆ bool         ┆ str         │
    ╞═════════════╪══════════════╪═════════════╡
    │ false       ┆ false        ┆ moo         │
    │ true        ┆ false        ┆ foo         │
    │ true        ┆ true         ┆ keepme      │
    │ true        ┆ true         ┆ andme       │
    └─────────────┴──────────────┴─────────────┘
    

    Used inside .when() - there is an implicit AND

    ┌─────────────┬──────────────┬─────────────┬───────────┐
    │ firstcolumn ┆ secondcolumn ┆ thirdcolumn ┆ good_bad  │
    │ ---         ┆ ---          ┆ ---         ┆ ---       │
    │ str         ┆ str          ┆ str         ┆ str       │
    ╞═════════════╪══════════════╪═════════════╪═══════════╡
    │ a           ┆ b            ┆ moo         ┆ very bad  │
    │ string1     ┆ no           ┆ foo         ┆ very bad  │
    │ string2     ┆ string1      ┆ keepme      ┆ very good │
    │ string1     ┆ string2      ┆ andme       ┆ very good │
    └─────────────┴──────────────┴─────────────┴───────────┘
    

    With regards to the moved value error - I have little rust knowledge but the compiler tells me:

    help: consider cloning the value if the performance cost is acceptable
       |
    15 |                 col("firstcolumn").is_in(lit(allowed.clone())).and(col("secondcolumn").is_in(lit(allowed))))
       |                                                     ++++++++
    

    And cloning a Series is a super cheap operation.