I want to filter a Polars DataFrame and then get the number of rows.
What I'm doing now seems to work but feels so wrong:
let item_count = item_df
.lazy()
.filter(not(col("status").is_in(lit(filter))))
.collect()?
.shape().0;
In a subsequent DataFrame operation I need to use this in a division operation
.with_column(
col("count")
.div(lit(item_count as f64))
.mul(lit(100.0))
.alias("percentage"),
);
This is for a tiny dataset (tens of rows) so I'm not worried about performance but I'd like to learn what the best way would be.
While there doesn't seem to be a predefined method on LazyFrame
, you can use polars expressions:
use polars::prelude::*;
let df = df!["a" => [1, 2], "b" => [3, 4]].unwrap();
dbg!(df.lazy().select([len()]).collect().unwrap());
And to get the numeric value:
df.lazy().select([len().alias("count")])
.collect().unwrap()
.column("count").unwrap()
.u32().unwrap()
.get(0).unwrap();