I am trying to group a dataframe by year of the date column. First, let's create a dataframe:
let s0 = Series::new("date", &["2021-01-14","2022-04-09","2021-06-24","2022-12-04","2022-11-25"]);
let s1 = Series::new("values", &[1, 2, 3, 4, 5]);
let mut df = DataFrame::new(vec![s0, s1])?;
df.try_apply("date", |col_series| {Ok(col_series.utf8().unwrap().as_date(Some("%Y-%m-%d")).unwrap().into_series())});
let lf = df.lazy();
And then here' the (non-working) code of what I would like to achieve.
lf.groupby([col("date").year()]).agg([col("values").sum()]).collect()
We can go to the date namespace for our "date"
column by calling col("date").dt()
.
This makes the year()
function available.
In your case the correct code would be
lf.groupby([col("date").dt().year()])
.agg([col("values").sum()])
.collect();
This was taken from the filtering section in the python docs
https://pola-rs.github.io/polars-book/user-guide/dsl/groupby.html#filtering