Search code examples
rustrust-polars

How do I filter a polars DataFrame by verifying if the value of a column is contained by an vector?


I have a dataframe which has the column "ID" with data typed as UInt32 and I have a vector named ids. I want to return a dataframe with the rows which "ID" value is contained by the vector ids.

MINIMAL WANTED EXAMPLE

use polars::df;
use polars::prelude::*;

fn filter_by_id(table: &DataFrame, ids: Vec<u32>) -> DataFrame {
    df!{
        "ID" => &[1, 3, 5],
        "VALUE" => &["B", "D", "F"]
    }.unwrap()
}

fn main() {
    let table = df!{
        "ID" => &[0, 1, 2, 3, 4, 5],
        "VALUE" => &["A", "B", "C", "D", "E", "F"]
    }.unwrap();
    let ids = vec![1, 3, 5];
    let filtered_table = filter_by_id(&table, ids);
    println!("{:?}", table);
    println!("{:?}", filtered_table);
}
ID VALUE
0 A
1 B
2 C
3 D
4 E
5 F

filter vector = [1, 3, 5]

wanted output =

ID VALUE
1 B
3 D
5 F

Solution

  • polars mostly operates on Series and Expr types. So by converting your vec to a Series you can accomplish this task relatively easy.

    
    use polars::df;
    use polars::prelude::*;
    
    fn main () {
        let table = df!{
            "ID" => &[0, 1, 2, 3, 4, 5],
            "VALUE" => &["A", "B", "C", "D", "E", "F"]
        }.unwrap();
        let ids = vec![1, 3, 5];
        // convert the vec to `Series`
        let ids_series = Series::new("ID", ids);
        // create a filter expression
        let filter_expr = col("ID").is_in(lit(ids_series));
        // filter the dataframe on the expression
        let filtered = table.lazy().filter(filter_expr).collect().unwrap();
        println!("{:?}", filtered);
    }
    

    Note: you will need to add the features lazy and is_in

    cargo add polars --features lazy,is_in