I've been trying to use Lazyframe
s instead of Dataframe
s more often due to performance reasons. Unfortunately, not all features available in DataFrame
s are available for LazyFrame
s, one of these being the .fill_null
method, that takes a FillNullStrategy
in the DataFrame
's method, but simply a generic E
where E: Into<Expr>
.
Today, I've tried extensively to replicate the same behavior of using a FillNullStrategy
for LazyFrame
to no avail with something like this:
lf.fill_null(
when(col("*").is_null())
.then(col("*").shift(Some(1)))
.otherwise(col(name)),
)
That didn't work when .collect()
ing the LazyFrame
, though.
I've noticed that we have such feature in Polars
Python
(docs), but not for Rust
. As I assume Polars team wouldn't expose such functionality simply by .collect()
ing the LazyFrame
and then .lazy()
ing it back, I believe I am missing something simpler here.
Does anybody have an insight on this?
This took a little digging, but it looks like the Python version also exposes explicit methods for some fill strategies. It looks like these are also exposed in the Rust APIs. Here's the code for documentation: https://github.com/pola-rs/polars/blob/275178c25b4bebf2f2c8a88993d445b5aabc8cc9/polars/polars-lazy/polars-plan/src/dsl/mod.rs#L782
Here's an example of the 'backwards' strategy:
let df = DataFrame::new(vec![
Series::new("data", vec![Some(1.0), None, Some(3.0), Some(4.0)])
])
.unwrap();
let lf = df.lazy().fill_null(col("*").backward_fill(Some(1))).collect();
println!("{:?}", lf);
and the result:
Ok(shape: (4, 1)
┌──────┐
│ data │
│ --- │
│ f64 │
╞══════╡
│ 1.0 │
│ 3.0 │
│ 3.0 │
│ 4.0 │
└──────┘)
There is also a forward_fill
method available. For setting literal values, you can simply use col("*").lit(value)
, similarly if you wanted to do max, min, etc, you can use col("*").max()
(or .min()
, etc).
If you want to use fill_null
directly, passing in a FillNullStrategy
, as mentioned by jqurious that is available on Series
rather than on DataFrames or Expr. But it looks like you can accomplish most if not all of the strategies using the above approaches.