Search code examples
pythonpython-polars

Using a column values within the round function


I have this dataframe:

shape: (5, 3)
┌───────────────┬──────────────┬────────────────────────┐
│ exchange_rate ┆ sig_figs_len ┆ reverse_rate_from_euro │
│ ---           ┆ ---          ┆ ---                    │
│ f64           ┆ u32          ┆ f64                    │
╞═══════════════╪══════════════╪════════════════════════╡
│ 6.4881        ┆ 5            ┆ 0.154128               │
│ 6.5196        ┆ 5            ┆ 0.153384               │
│ 6.4527        ┆ 5            ┆ 0.154974               │
│ 6.41          ┆ 3            ┆ 0.156006               │
│ 6.425         ┆ 4            ┆ 0.155642               │
└───────────────┴──────────────┴────────────────────────┘

I would like to round the value of in every cell of the reverse_rate_from_euro with the matching value in the corresponding cell of the sig_figs_len column.

I came up with a solution which uses the apply/map_rows function, but since the dataset is quite big and using apply/pure python isn't an optimized ideal solution to go about it, i would like to find a better solution.

Here's the snippet:

df = df.with_columns(
    (df.map_rows(lambda df_: round(df_[-1], df_[-2])))
    .to_series()
    .alias("reverse_rate_to_euro_rounded_sig_figs")

Is there a better solution that uses any of the built-in Polars expressions API?

result set should look like so:

shape: (5, 4)
┌───────────────┬──────────────┬────────────────────────┬───────────────────────────────────┐
│ exchange_rate ┆ sig_figs_len ┆ reverse_rate_from_euro ┆ reverse_rate_to_euro_rounded_sig… │
│ ---           ┆ ---          ┆ ---                    ┆ ---                               │
│ f64           ┆ u32          ┆ f64                    ┆ f64                               │
╞═══════════════╪══════════════╪════════════════════════╪═══════════════════════════════════╡
│ 6.4881        ┆ 5            ┆ 0.154128               ┆ 0.15413                           │
│ 6.5196        ┆ 5            ┆ 0.153384               ┆ 0.15338                           │
│ 6.4527        ┆ 5            ┆ 0.154974               ┆ 0.15497                           │
│ 6.41          ┆ 3            ┆ 0.156006               ┆ 0.156                             │
│ 6.425         ┆ 4            ┆ 0.155642               ┆ 0.1556                            │
└───────────────┴──────────────┴────────────────────────┴───────────────────────────────────┘

Any input is highly appreciated! thanks for reading!


Solution

  • You could probably do something like this:

    df.with_columns(
        (
            pl.col('reverse_rate_from_euro') * 
            pl.lit(10).pow(pl.col('sig_figs_len'))
        ).round() * 
        pl.lit(0.1).pow(pl.col('sig_figs_len'))
    )
    
    ┌───────────────┬──────────────┬────────────────────────┐
    │ exchange_rate ┆ sig_figs_len ┆ reverse_rate_from_euro │
    │ ---           ┆ ---          ┆ ---                    │
    │ f64           ┆ i64          ┆ f64                    │
    ╞═══════════════╪══════════════╪════════════════════════╡
    │ 6.4881        ┆ 3            ┆ 0.154                  │
    │ 6.5196        ┆ 4            ┆ 0.1534                 │
    └───────────────┴──────────────┴────────────────────────┘
    

    alternatively, if you know that number of different sig_figs_len values is not large, you can just enumerate over unique() values and create result with coalesce() and when():

    df.with_columns(
        pl.coalesce(
            pl.when(pl.col("sig_figs_len") == x)
            .then(pl.col("reverse_rate_from_euro").round(x))
            for x in df['sig_figs_len'].unique()
        ).alias('reverse_rate_to_euro_rounded_sig_figs')
    )
    
    ┌───────────────┬──────────────┬────────────────────────┬───────────────────────────────────┐
    │ exchange_rate ┆ sig_figs_len ┆ reverse_rate_from_euro ┆ reverse_rate_to_euro_rounded_sig… │
    │ ---           ┆ ---          ┆ ---                    ┆ ---                               │
    │ f64           ┆ i64          ┆ f64                    ┆ f64                               │
    ╞═══════════════╪══════════════╪════════════════════════╪═══════════════════════════════════╡
    │ 6.4881        ┆ 3            ┆ 0.154128               ┆ 0.154                             │
    │ 6.5196        ┆ 4            ┆ 0.153384               ┆ 0.1534                            │
    └───────────────┴──────────────┴────────────────────────┴───────────────────────────────────┘
    

    Essentially, what we do here is creating a Iterable[IntoExpr] which you can see as a list of columns where each column contains only values rounded to certain sig_figs_len value:

    df.with_columns(
        pl.when(pl.col("sig_figs_len") == x)
        .then(pl.col("reverse_rate_from_euro").round(x)).name.suffix(f"_{x}")
         for x in df['sig_figs_len'].unique()
    ))
    
    ┌───────────────┬──────────────┬─────────────────────┬─────────────────────┬─────────────────────┬─────────────────────┐
    │ exchange_rate ┆ sig_figs_len ┆ reverse_rate_from_e ┆ reverse_rate_from_e ┆ reverse_rate_from_e ┆ reverse_rate_from_e │
    │ ---           ┆ ---          ┆ uro                 ┆ uro_3               ┆ uro_4               ┆ uro_5               │
    │ f64           ┆ i64          ┆ ---                 ┆ ---                 ┆ ---                 ┆ ---                 │
    │               ┆              ┆ f64                 ┆ f64                 ┆ f64                 ┆ f64                 │
    ╞═══════════════╪══════════════╪═════════════════════╪═════════════════════╪═════════════════════╪═════════════════════╡
    │ 6.4881        ┆ 5            ┆ 0.154128            ┆ null                ┆ null                ┆ 0.15413             │
    │ 6.5196        ┆ 5            ┆ 0.153384            ┆ null                ┆ null                ┆ 0.15338             │
    │ 6.4527        ┆ 5            ┆ 0.154974            ┆ null                ┆ null                ┆ 0.15497             │
    │ 6.41          ┆ 3            ┆ 0.156006            ┆ 0.156               ┆ null                ┆ null                │
    │ 6.425         ┆ 4            ┆ 0.155642            ┆ null                ┆ 0.1556              ┆ null                │
    └───────────────┴──────────────┴─────────────────────┴─────────────────────┴─────────────────────┴─────────────────────┘
    

    and then we use the fact that pl.coalesce() accept iterable of Expr to get first non-null column