I have two DataFrames like this.
df1 = pl.DataFrame({
"col_1": np.random.rand(),
"col_2": np.random.rand(),
"col_3": np.random.rand()
})
┌──────────┬─────────┬──────────┐
│ col_1 ┆ col_2 ┆ col_3 │
│ --- ┆ --- ┆ --- │
│ f64 ┆ f64 ┆ f64 │
╞══════════╪═════════╪══════════╡
│ 0.534349 ┆ 0.84115 ┆ 0.526435 │
└──────────┴─────────┴──────────┘
df2 = pl.DataFrame({
"col_1": np.random.randint(0, 2, 5),
"col_2": np.random.randint(0, 2, 5),
"col_3": np.random.randint(0, 2, 5)
})
┌───────┬───────┬───────┐
│ col_1 ┆ col_2 ┆ col_3 │
│ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ i64 │
╞═══════╪═══════╪═══════╡
│ 0 ┆ 0 ┆ 0 │
│ 0 ┆ 1 ┆ 0 │
│ 1 ┆ 1 ┆ 1 │
│ 1 ┆ 1 ┆ 0 │
│ 1 ┆ 1 ┆ 1 │
└───────┴───────┴───────┘
I want to replace the 1s in the second DataFrame with the corresponding value in the 2nd DataFrame. And the zeros should be replaced with 1s. Resulting in this:
┌──────────┬─────────┬──────────┐
│ col_1 ┆ col_2 ┆ col_3 │
│ --- ┆ --- ┆ --- │
│ f64 ┆ f64 ┆ f64 │
╞══════════╪═════════╪══════════╡
│ 1.0 ┆ 1.0 ┆ 1.0 │
│ 1.0 ┆ 0.84115 ┆ 1.0 │
│ 0.534349 ┆ 0.84115 ┆ 0.526435 │
│ 0.534349 ┆ 0.84115 ┆ 1.0 │
│ 0.534349 ┆ 0.84115 ┆ 0.526435 │
└──────────┴─────────┴──────────┘
I tried reshaping df1
to have the same height as df2
, like this:
df1 = df1.select(pl.all().repeat_by(df2.height).arr.explode())
And if I rename the columns so they're not the same, I could horizontally concatenate the 2 DataFrames using pl.concat
. But I'm unsure where to go from there. How could I achieve this? Or is there a better approach?
You could just build multiple expressions from each column name:
df2.select(
pl.when(pl.col(col) == 1)
.then(df1.get_column(col).item())
.otherwise(1)
.alias(col)
for col in df2.columns
)
shape: (5, 3)
┌──────────┬─────────┬──────────┐
│ col_1 ┆ col_2 ┆ col_3 │
│ --- ┆ --- ┆ --- │
│ f64 ┆ f64 ┆ f64 │
╞══════════╪═════════╪══════════╡
│ 1.0 ┆ 1.0 ┆ 1.0 │
│ 1.0 ┆ 0.84115 ┆ 1.0 │
│ 0.534349 ┆ 0.84115 ┆ 0.526435 │
│ 0.534349 ┆ 0.84115 ┆ 1.0 │
│ 0.534349 ┆ 0.84115 ┆ 0.526435 │
└──────────┴─────────┴──────────┘