I have this dataframe:
shape: (5, 3)
┌───────────────┬──────────────┬────────────────────────┐
│ exchange_rate ┆ sig_figs_len ┆ reverse_rate_from_euro │
│ --- ┆ --- ┆ --- │
│ f64 ┆ u32 ┆ f64 │
╞═══════════════╪══════════════╪════════════════════════╡
│ 6.4881 ┆ 5 ┆ 0.154128 │
│ 6.5196 ┆ 5 ┆ 0.153384 │
│ 6.4527 ┆ 5 ┆ 0.154974 │
│ 6.41 ┆ 3 ┆ 0.156006 │
│ 6.425 ┆ 4 ┆ 0.155642 │
└───────────────┴──────────────┴────────────────────────┘
I would like to round the value of in every cell of the reverse_rate_from_euro
with the matching value in the corresponding cell of the sig_figs_len
column.
I came up with a solution which uses the apply/map_rows function, but since the dataset is quite big and using apply/pure python isn't an optimized ideal solution to go about it, i would like to find a better solution.
Here's the snippet:
df = df.with_columns(
(df.map_rows(lambda df_: round(df_[-1], df_[-2])))
.to_series()
.alias("reverse_rate_to_euro_rounded_sig_figs")
Is there a better solution that uses any of the built-in Polars expressions API?
result set should look like so:
shape: (5, 4)
┌───────────────┬──────────────┬────────────────────────┬───────────────────────────────────┐
│ exchange_rate ┆ sig_figs_len ┆ reverse_rate_from_euro ┆ reverse_rate_to_euro_rounded_sig… │
│ --- ┆ --- ┆ --- ┆ --- │
│ f64 ┆ u32 ┆ f64 ┆ f64 │
╞═══════════════╪══════════════╪════════════════════════╪═══════════════════════════════════╡
│ 6.4881 ┆ 5 ┆ 0.154128 ┆ 0.15413 │
│ 6.5196 ┆ 5 ┆ 0.153384 ┆ 0.15338 │
│ 6.4527 ┆ 5 ┆ 0.154974 ┆ 0.15497 │
│ 6.41 ┆ 3 ┆ 0.156006 ┆ 0.156 │
│ 6.425 ┆ 4 ┆ 0.155642 ┆ 0.1556 │
└───────────────┴──────────────┴────────────────────────┴───────────────────────────────────┘
Any input is highly appreciated! thanks for reading!
You could probably do something like this:
df.with_columns(
(
pl.col('reverse_rate_from_euro') *
pl.lit(10).pow(pl.col('sig_figs_len'))
).round() *
pl.lit(0.1).pow(pl.col('sig_figs_len'))
)
┌───────────────┬──────────────┬────────────────────────┐
│ exchange_rate ┆ sig_figs_len ┆ reverse_rate_from_euro │
│ --- ┆ --- ┆ --- │
│ f64 ┆ i64 ┆ f64 │
╞═══════════════╪══════════════╪════════════════════════╡
│ 6.4881 ┆ 3 ┆ 0.154 │
│ 6.5196 ┆ 4 ┆ 0.1534 │
└───────────────┴──────────────┴────────────────────────┘
alternatively, if you know that number of different sig_figs_len
values is not large, you can just enumerate over unique()
values and create result with coalesce()
and when()
:
df.with_columns(
pl.coalesce(
pl.when(pl.col("sig_figs_len") == x)
.then(pl.col("reverse_rate_from_euro").round(x))
for x in df['sig_figs_len'].unique()
).alias('reverse_rate_to_euro_rounded_sig_figs')
)
┌───────────────┬──────────────┬────────────────────────┬───────────────────────────────────┐
│ exchange_rate ┆ sig_figs_len ┆ reverse_rate_from_euro ┆ reverse_rate_to_euro_rounded_sig… │
│ --- ┆ --- ┆ --- ┆ --- │
│ f64 ┆ i64 ┆ f64 ┆ f64 │
╞═══════════════╪══════════════╪════════════════════════╪═══════════════════════════════════╡
│ 6.4881 ┆ 3 ┆ 0.154128 ┆ 0.154 │
│ 6.5196 ┆ 4 ┆ 0.153384 ┆ 0.1534 │
└───────────────┴──────────────┴────────────────────────┴───────────────────────────────────┘
Essentially, what we do here is creating a Iterable[IntoExpr]
which you can see as a list of columns where each column contains only values rounded to certain sig_figs_len
value:
df.with_columns(
pl.when(pl.col("sig_figs_len") == x)
.then(pl.col("reverse_rate_from_euro").round(x)).name.suffix(f"_{x}")
for x in df['sig_figs_len'].unique()
))
┌───────────────┬──────────────┬─────────────────────┬─────────────────────┬─────────────────────┬─────────────────────┐
│ exchange_rate ┆ sig_figs_len ┆ reverse_rate_from_e ┆ reverse_rate_from_e ┆ reverse_rate_from_e ┆ reverse_rate_from_e │
│ --- ┆ --- ┆ uro ┆ uro_3 ┆ uro_4 ┆ uro_5 │
│ f64 ┆ i64 ┆ --- ┆ --- ┆ --- ┆ --- │
│ ┆ ┆ f64 ┆ f64 ┆ f64 ┆ f64 │
╞═══════════════╪══════════════╪═════════════════════╪═════════════════════╪═════════════════════╪═════════════════════╡
│ 6.4881 ┆ 5 ┆ 0.154128 ┆ null ┆ null ┆ 0.15413 │
│ 6.5196 ┆ 5 ┆ 0.153384 ┆ null ┆ null ┆ 0.15338 │
│ 6.4527 ┆ 5 ┆ 0.154974 ┆ null ┆ null ┆ 0.15497 │
│ 6.41 ┆ 3 ┆ 0.156006 ┆ 0.156 ┆ null ┆ null │
│ 6.425 ┆ 4 ┆ 0.155642 ┆ null ┆ 0.1556 ┆ null │
└───────────────┴──────────────┴─────────────────────┴─────────────────────┴─────────────────────┴─────────────────────┘
and then we use the fact that pl.coalesce()
accept iterable of Expr
to get first non-null column