I have 2 polars LazyFrames:
df1 = pl.DataFrame(data={
'foo': np.random.uniform(0,127, size= n).astype(np.float64),
'bar': np.random.uniform(1e3,32767, size= n).astype(np.float64),
'baz': np.random.uniform(1e6,2147483, size= n).astype(np.float64)
}).lazy()
df2 = pl.DataFrame(data={
'foo': np.random.uniform(0,127, size= n).astype(np.float64),
'bar': np.random.uniform(1e3,32767, size= n).astype(np.float64),
'baz': np.random.uniform(1e6,2147483, size= n).astype(np.float64)
}).lazy()
I would like to multiply each column in df1 with its respective column in df2.
If I convert these to non-lazy DataFrames
I can achieve this:
df1.collect() * df2.collect()
foo bar baz
f64 f64 f64
3831.295563 6.4637e6 3.3669e12
164.194271 2.9691e8 2.2696e12
3655.918761 1.9444e7 2.3625e12
7191.48868 3.7044e7 3.1687e12
9559.505277 2.6864e8 2.5426e12
However, if I try to perform the same expression on the LazyFrames
, I get an exception
df1 * df2
TypeError
: unsupported operand type(s) for*
: 'LazyFrame
' and 'LazyFrame
'
How can I perform column-wise multiplication across 2 LazyFrames
?
you'll need to join
(
df1.with_row_index()
.join(df2.with_row_index(), on="index")
.select(pl.col(col) * pl.col(f"{col}_right") for col in df1.columns)
.collect()
)
shape: (10, 3)
┌─────────────┬──────────┬───────────┐
│ foo ┆ bar ┆ baz │
│ --- ┆ --- ┆ --- │
│ f64 ┆ f64 ┆ f64 │
╞═════════════╪══════════╪═══════════╡
│ 6623.602754 ┆ 2.7173e8 ┆ 3.7654e12 │
│ 2588.499522 ┆ 7.4295e8 ┆ 3.0266e12 │
│ 933.474643 ┆ 3.7090e8 ┆ 4.2794e12 │
│ 7061.625136 ┆ 2.2365e8 ┆ 2.7040e12 │
│ … ┆ … ┆ … │
│ 2717.969236 ┆ 4.9398e7 ┆ 3.0930e12 │
│ 785.760153 ┆ 1.6305e8 ┆ 1.8954e12 │
│ 9534.366291 ┆ 7.3153e8 ┆ 1.9056e12 │
│ 1916.452503 ┆ 1.4976e8 ┆ 3.2704e12 │
└─────────────┴──────────┴───────────┘