Search code examples
rustfiltertime-seriesrust-polars

Filter by time range in polars rust


I've some times series data which I would like to filter out a certain time range for each day. In my case I would like to filter out everything between 09:00 - 16:00 (i.e. I want all values from 09:00:00 to 16:00:00 inclusive).

I've tried and read as much documentations I can but since most of polars documentation is written for python am I having quite hard time to find a solution.

And the documentation for polars-rust is still missing a lot of examples and explaining texts.

This is the only question I can find on the subject but it does not work:
Filter by hour of datetime in rust polars It runs but no filtering is performed or returned.

I've tried:

let df = LazyCsvReader::new(path)
        .with_parse_dates(true)
        .has_header(true)
        .finish()?
        .collect()?;

let a = df
        .lazy()
        .filter(
            col("time")
                .dt()
                .hour()
                .gt_eq(9)
                .and(col("time").dt().hour().lt_eq(16)),
        )
        .collect();

and it almost works. I get everything from 9:00-16:55.

┌─────────────────────┐
│ time                │
│ ---                 │
│ datetime[μs]        │
╞═════════════════════╡
│ 2019-09-03 09:00:00 │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 2019-09-03 09:30:00 │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 2019-09-03 09:35:00 │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 2019-09-03 09:40:00 │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ ...                 │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 2022-03-18 16:25:00 │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 2022-03-18 16:30:00 │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 2022-03-18 16:45:00 │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 2022-03-18 16:55:00 │
└─────────────────────┘

I've tried to add .and(col("time").dt().minute().eq(0) but that will affect ever hour, not just the "16-hour". I could use hour().le_eq(15) as a quick fix but I will still miss the last 5 min (15:55).

I cant seem to find any way to use if/else statement in these expressions.

Could anyone point me in the right direction?


Solution

  • One way to solve this is to get everything before 16:00 OR at exactly 16:00:

         let a = df
            .lazy()
            .filter(
                col("time")
                    .dt()
                    .hour()
                    .gt_eq(9)
                    .and(
                        // less than 16 handles everything up to 15:59
                        col("time")
                            .dt()
                            .hour()
                            .lt(16)
                            .or(
                                // include also 16:00
                                col("time")
                                    .dt()
                                    .hour()
                                    .eq(16)
                                    .and(
                                        col("time").dt().minute().eq(0)
                                    )
                            )
                    ),
            )
            .collect();