I've some times series data which I would like to filter out a certain time range for each day. In my case I would like to filter out everything between 09:00 - 16:00 (i.e. I want all values from 09:00:00 to 16:00:00 inclusive).
I've tried and read as much documentations I can but since most of polars documentation is written for python am I having quite hard time to find a solution.
And the documentation for polars-rust is still missing a lot of examples and explaining texts.
This is the only question I can find on the subject but it does not work:
Filter by hour of datetime in rust polars
It runs but no filtering is performed or returned.
I've tried:
let df = LazyCsvReader::new(path)
.with_parse_dates(true)
.has_header(true)
.finish()?
.collect()?;
let a = df
.lazy()
.filter(
col("time")
.dt()
.hour()
.gt_eq(9)
.and(col("time").dt().hour().lt_eq(16)),
)
.collect();
and it almost works. I get everything from 9:00-16:55.
┌─────────────────────┐
│ time │
│ --- │
│ datetime[μs] │
╞═════════════════════╡
│ 2019-09-03 09:00:00 │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 2019-09-03 09:30:00 │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 2019-09-03 09:35:00 │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 2019-09-03 09:40:00 │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ ... │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 2022-03-18 16:25:00 │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 2022-03-18 16:30:00 │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 2022-03-18 16:45:00 │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 2022-03-18 16:55:00 │
└─────────────────────┘
I've tried to add .and(col("time").dt().minute().eq(0)
but that will affect ever hour, not just the "16-hour". I could use hour().le_eq(15)
as a quick fix but I will still miss the last 5 min (15:55).
I cant seem to find any way to use if/else statement in these expressions.
Could anyone point me in the right direction?
One way to solve this is to get everything before 16:00 OR at exactly 16:00:
let a = df
.lazy()
.filter(
col("time")
.dt()
.hour()
.gt_eq(9)
.and(
// less than 16 handles everything up to 15:59
col("time")
.dt()
.hour()
.lt(16)
.or(
// include also 16:00
col("time")
.dt()
.hour()
.eq(16)
.and(
col("time").dt().minute().eq(0)
)
)
),
)
.collect();