Suppose I have two dataframes like:
let df_1 = df! {
"1" => [1, 2, 2, 3, 4, 3],
"2" => [1, 4, 2, 3, 4, 3],
"3" => [1, 2, 6, 3, 4, 3],
}
.unwrap();
let mut df_2 = df_1.clone();
for idx in 0..df_2.width() {
df_2.apply_at_idx(idx, |s| {
s.cummax(false)
.shift(1)
.fill_null(FillNullStrategy::Zero)
.unwrap()
})
.unwrap();
}
println!("{:#?}", df_1);
println!("{:#?}", df_2);
shape: (6, 3)
┌─────┬─────┬─────┐
│ 1 ┆ 2 ┆ 3 │
│ --- ┆ --- ┆ --- │
│ i32 ┆ i32 ┆ i32 │
╞═════╪═════╪═════╡
│ 1 ┆ 1 ┆ 1 │
├╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌┤
│ 2 ┆ 4 ┆ 2 │
├╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌┤
│ 2 ┆ 2 ┆ 6 │
├╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌┤
│ 3 ┆ 3 ┆ 3 │
├╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌┤
│ 4 ┆ 4 ┆ 4 │
├╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌┤
│ 3 ┆ 3 ┆ 3 │
└─────┴─────┴─────┘
shape: (6, 3)
┌─────┬─────┬─────┐
│ 1 ┆ 2 ┆ 3 │
│ --- ┆ --- ┆ --- │
│ i32 ┆ i32 ┆ i32 │
╞═════╪═════╪═════╡
│ 0 ┆ 0 ┆ 0 │
├╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌┤
│ 1 ┆ 1 ┆ 1 │
├╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌┤
│ 2 ┆ 4 ┆ 2 │
├╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌┤
│ 2 ┆ 4 ┆ 6 │
├╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌┤
│ 3 ┆ 4 ┆ 6 │
├╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌┤
│ 4 ┆ 4 ┆ 6 │
└─────┴─────┴─────┘
and I want to compare them such that I end up with a boolean dataframe I can use as a predicate for a selection and aggregation:
shape: (6, 3)
┌───────┬───────┬───────┐
│ 1 ┆ 2 ┆ 3 │
│ --- ┆ --- ┆ --- │
│ bool ┆ bool ┆ bool │
╞═══════╪═══════╪═══════╡
│ true ┆ true ┆ true │
├╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┤
│ true ┆ true ┆ true │
├╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┤
│ true ┆ false ┆ true │
├╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┤
│ true ┆ false ┆ false │
├╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┤
│ true ┆ true ┆ false │
├╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┤
│ false ┆ false ┆ false │
└───────┴───────┴───────┘
In Python Pandas I might do df.where(df_1.ge(df_2)).sum().sum()
. What's the idiomatic way to do that with Rust Pola-rs?
<edit>
If you actually have a single dataframe you can do:
let mask =
when(all().gt_eq(
all().cummax(false).shift(1).fill_null(0)))
.then(all())
.otherwise(lit(NULL));
let out =
df_1.lazy().select(&[mask])
//.sum()
.collect();
</edit>
From https://stackoverflow.com/a/72899438
Masking out values by columns in another DataFrame is a potential for errors caused by different lengths. For this reason polars does not encourage such operations
It appears the recommended way is to add a suffix to one of the dataframes, "concat" them and use when/then/otherwise
.
.with_context()
has been added since that answer which can be used to access both dataframes.
In Python:
df1.lazy().with_context(
df2.lazy().select(pl.all().suffix("_right"))
).select([
pl.when(pl.col(name) >= pl.col(f"{name}_right"))
.then(pl.col(name))
for name in df1.columns
]).collect()
I've not used rust - but my attempt at a translation:
let mask =
df_1.get_column_names().iter().map(|name|
when(col(name).gt_eq(col(&format!("{name}_right"))))
.then(col(name))
.otherwise(lit(NULL))
).collect::<Vec<Expr>>();
let out =
df_1.lazy()
.with_context(&[
df_2.lazy().select(&[all().suffix("_right")])
])
.select(&mask)
//.sum()
.collect();
println!("{:#?}", out);
Output:
Ok(shape: (6, 3)
┌──────┬──────┬──────┐
│ 1 ┆ 2 ┆ 3 │
│ --- ┆ --- ┆ --- │
│ i32 ┆ i32 ┆ i32 │
╞══════╪══════╪══════╡
│ 1 ┆ 1 ┆ 1 │
├╌╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌╌┤
│ 2 ┆ 4 ┆ 2 │
├╌╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌╌┤
│ 2 ┆ null ┆ 6 │
├╌╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌╌┤
│ 3 ┆ null ┆ null │
├╌╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌╌┤
│ 4 ┆ 4 ┆ null │
├╌╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌╌┤
│ null ┆ null ┆ null │
└──────┴──────┴──────┘)