Search code examples
pythonpython-polars

Multiply elements of list column in polars dataframe with elements of regular python list


I have a pl.DataFrame with a column comprising lists like this:

import polars as pl

df = pl.DataFrame(
    {
        "symbol": ["A", "A", "B", "B"],
        "roc": [[0.1, 0.2], [0.3, 0.4], [0.5, 0.6], [0.7, 0.8]],
    }
)

shape: (4, 2)
┌────────┬────────────┐
│ symbol ┆ roc        │
│ ---    ┆ ---        │
│ str    ┆ list[f64]  │
╞════════╪════════════╡
│ A      ┆ [0.1, 0.2] │
│ A      ┆ [0.3, 0.4] │
│ B      ┆ [0.5, 0.6] │
│ B      ┆ [0.7, 0.8] │
└────────┴────────────┘

Further, I have a regular python list weights = [0.3, 0.7]

What's an efficient way to multiply pl.col("roc") with weights in a way where the first and second element of the column will be multiplied with the first and second element of weights, respectively?

The expected output is like this:

shape: (4, 3)
┌────────┬────────────┐──────────────┐
│ symbol ┆ roc        │ roc_wgt      │
│ ---    ┆ ---        │ ---          │
│ str    ┆ list[f64]  │ list[f64]    │
╞════════╪════════════╡══════════════╡
│ A      ┆ [0.1, 0.2] │ [0.03, 0.14] │ = [0.1 * 0.3, 0.2 * 0.7]
│ A      ┆ [0.3, 0.4] │ [0.09, 0.28] │ = [0.3 * 0.3, 0.4 * 0.7]
│ B      ┆ [0.5, 0.6] │ [0.15, 0.42] │ = [0.5 * 0.3, 0.6 * 0.7]
│ B      ┆ [0.7, 0.8] │ [0.21, 0.56] │ = [0.7 * 0.3, 0.8 * 0.7]
└────────┴────────────┘──────────────┘

Solution

  • Update: Broadcasting of literals/scalars for the List type was added in 1.10.0

    df.with_columns(roc_wgt = pl.col.roc * weights)
    
    shape: (4, 3)
    ┌────────┬────────────┬──────────────┐
    │ symbol ┆ roc        ┆ roc_wgt      │
    │ ---    ┆ ---        ┆ ---          │
    │ str    ┆ list[f64]  ┆ list[f64]    │
    ╞════════╪════════════╪══════════════╡
    │ A      ┆ [0.1, 0.2] ┆ [0.03, 0.14] │
    │ A      ┆ [0.3, 0.4] ┆ [0.09, 0.28] │
    │ B      ┆ [0.5, 0.6] ┆ [0.15, 0.42] │
    │ B      ┆ [0.7, 0.8] ┆ [0.21, 0.56] │
    └────────┴────────────┴──────────────┘
    

    As of Polars 1.8.0 list arithmetic has been merged.

    Follow on work will add support for broadcasting of literals (and scalars).

    It can be added as a column for now.

    (df.with_columns(wgt = weights)
       .with_columns(roc_wgt = pl.col.roc * pl.col.wgt)
    )
    
    shape: (4, 4)
    ┌────────┬────────────┬────────────┬──────────────┐
    │ symbol ┆ roc        ┆ wgt        ┆ roc_wgt      │
    │ ---    ┆ ---        ┆ ---        ┆ ---          │
    │ str    ┆ list[f64]  ┆ list[f64]  ┆ list[f64]    │
    ╞════════╪════════════╪════════════╪══════════════╡
    │ A      ┆ [0.1, 0.2] ┆ [0.3, 0.7] ┆ [0.03, 0.14] │
    │ A      ┆ [0.3, 0.4] ┆ [0.3, 0.7] ┆ [0.09, 0.28] │
    │ B      ┆ [0.5, 0.6] ┆ [0.3, 0.7] ┆ [0.15, 0.42] │
    │ B      ┆ [0.7, 0.8] ┆ [0.3, 0.7] ┆ [0.21, 0.56] │
    └────────┴────────────┴────────────┴──────────────┘
    

    Broadcasting of literals works for the Array datatype as of 1.8.0

    dtype = pl.Array(float, 2)
    
    df.with_columns(roc_wgt = pl.col.roc.cast(dtype) * pl.lit(weights, dtype))
    
    shape: (4, 3)
    ┌────────┬────────────┬───────────────┐
    │ symbol ┆ roc        ┆ roc_wgt       │
    │ ---    ┆ ---        ┆ ---           │
    │ str    ┆ list[f64]  ┆ array[f64, 2] │
    ╞════════╪════════════╪═══════════════╡
    │ A      ┆ [0.1, 0.2] ┆ [0.03, 0.14]  │
    │ A      ┆ [0.3, 0.4] ┆ [0.09, 0.28]  │
    │ B      ┆ [0.5, 0.6] ┆ [0.15, 0.42]  │
    │ B      ┆ [0.7, 0.8] ┆ [0.21, 0.56]  │
    └────────┴────────────┴───────────────┘