Search code examples
rustrust-polars

Rust polars: Filter a lazy frame by datetime range


I am trying to filter a LazyFrame using a chrono::NaiveDateTime range. Here is where I currently am at:

use polars::prelude::*;
use polars_lazy::prelude::*;

pub fn keep_range_lazy(
    df: &mut DataFrame,
    start: &NaiveDateTime,
    end: &NaiveDateTime,
) -> Result<(), PolarsError> {
    assert!(
        df.get_column_names().contains(&"timestamp"),
        "Dataframe does not contain timestamp column."
    );

    df = &mut df
        .lazy()
        .filter(
            col("timestamp")
                .dt()
                .datetime()
                .gt_eq(start)
                .and(col("timestamp").dt().datetime().lt(end)),
        )
        .collect()?;

    Ok(())
}

The above code fails because the start and end variables are not convertible to Expr types:

error[E0277]: the trait bound `polars_lazy::dsl::Expr: std::convert::From<&chrono::NaiveDateTime>` is not satisfied
   --> src/utils.rs:32:24
    |
32  |                 .gt_eq(start)
    |                  ----- ^^^^^ the trait `std::convert::From<&chrono::NaiveDateTime>` is not implemented for `polars_lazy::dsl::Expr`
    |                  |
    |                  required by a bound introduced by this call
    |
    = help: the following other types implement trait `std::convert::From<T>`:
              <polars_lazy::dsl::Expr as std::convert::From<&str>>
              <polars_lazy::dsl::Expr as std::convert::From<bool>>
              <polars_lazy::dsl::Expr as std::convert::From<f32>>
              <polars_lazy::dsl::Expr as std::convert::From<f64>>
              <polars_lazy::dsl::Expr as std::convert::From<i32>>
              <polars_lazy::dsl::Expr as std::convert::From<i64>>
              <polars_lazy::dsl::Expr as std::convert::From<polars_lazy::dsl::AggExpr>>
              <polars_lazy::dsl::Expr as std::convert::From<u32>>
              <polars_lazy::dsl::Expr as std::convert::From<u64>>
    = note: required for `&chrono::NaiveDateTime` to implement `std::convert::Into<polars_lazy::dsl::Expr>`
note: required by a bound in `polars_plan::dsl::<impl polars_lazy::dsl::Expr>::gt_eq`
   --> /home/username/.cargo/registry/src/github.com-1ecc6299db9ec823/polars-plan-0.28.0/src/dsl/mod.rs:258:21
    |
258 |     pub fn gt_eq<E: Into<Expr>>(self, other: E) -> Expr {
    |                     ^^^^^^^^^^ required by this bound in `polars_plan::dsl::<impl Expr>::gt_eq`

Notes:

  • I have seen this answer which does not suit me because it implies using .hours().minutes().seconds() whilst there should be a way to simply use a single DateTime variable.
  • I have this other answer which does not suit me either because it uses a DataFrame instead of a LazyFrame.
  • The solution does not especially have to be inline. Meaning the final signature of that function could very well be pub fn get_range_lazy(df: DataFrame, start: &NaiveDateTime, end: &NaiveDateTime) -> Result<DataFrame, PolarsError> if it does not imply a performance loss.

Here is the doc of the polars DSL.


Solution

  • I love the compiler errors in rust, they are so instructive as to the problem, and often even point to a solution! Like in this case, where they are telling you clearly that the trait bound is not satisfied but there are numerous implementations from integer types. To use the NaiveDatetime type we need only to cast the values to an appropriate integer value first. For example:

    df.filter(
        col("timestamp")
            .gt_eq(start.timestamp_millis())
            .and(
                col("timestamp").lt(end.timestamp_millis()),
            ),
    )
    

    where I have used timestamp_millis() here as "timestamp" is in ms--your case may vary, of course.