Search code examples
pythonlistpython-polars

Python-polars: how to multiply each element in a list with a value in a different column?


I have a dataframe with a certain number of groups, containing a weight column and a list of values, which can be of arbitrary length, so for example:

df = pl.DataFrame(
    {
        "Group": ["Group1", "Group2", "Group3"],
        "Weight": [100.0, 200.0, 300.0],
        "Vals": [[0.5, 0.5, 0.8],[0.5, 0.5, 0.8], [0.7, 0.9]]
    }
)
┌────────┬────────┬─────────────────┐
│ Group  ┆ Weight ┆ Vals            │
│ ---    ┆ ---    ┆ ---             │
│ str    ┆ f64    ┆ list[f64]       │
╞════════╪════════╪═════════════════╡
│ Group1 ┆ 100.0  ┆ [0.5, 0.5, 0.8] │
│ Group2 ┆ 200.0  ┆ [0.5, 0.5, 0.8] │
│ Group3 ┆ 300.0  ┆ [0.7, 0.9]      │
└────────┴────────┴─────────────────┘

My goal is to calculate a 'weighted' column, which would be the multiple of each item in the values list with the value in the weight column:

┌────────┬────────┬─────────────────┬─────────────────┐
│ Group  ┆ Weight ┆ Vals            ┆ Weighted        │
│ ---    ┆ ---    ┆ ---             ┆ ---             │
│ str    ┆ f64    ┆ list[f64]       ┆ list[i64]       │
╞════════╪════════╪═════════════════╪═════════════════╡
│ Group1 ┆ 100.0  ┆ [0.5, 0.5, 0.8] ┆ [50, 50, 80]    │
│ Group2 ┆ 200.0  ┆ [0.5, 0.5, 0.8] ┆ [100, 100, 160] │
│ Group3 ┆ 300.0  ┆ [0.7, 0.9]      ┆ [210, 270]      │
└────────┴────────┴─────────────────┴─────────────────┘

I've tried a few different things:

df.with_columns(
    pl.col("Vals").list.eval(pl.element() * 3).alias("Weight1"), #Multiplying with literal works
    pl.col("Vals").list.eval(pl.element() * pl.col("Weight")).alias("Weight2"), #Does not work
    pl.col("Vals").list.eval(pl.element() * pl.col("Unknown")).alias("Weight3"), #Unknown columns give same value
    pl.col("Vals").list.eval(pl.col("Vals") * pl.col("Weight")).alias("Weight4"), #Same effect
    # pl.col('Vals') * 3 -> gives an error
)
┌────────┬────────┬────────────┬────────────┬──────────────┬──────────────┬────────────────────┐
│ Group  ┆ Weight ┆ Vals       ┆ Weight1    ┆ Weight2      ┆ Weight3      ┆ Weight4            │
│ ---    ┆ ---    ┆ ---        ┆ ---        ┆ ---          ┆ ---          ┆ ---                │
│ str    ┆ f64    ┆ list[f64]  ┆ list[f64]  ┆ list[f64]    ┆ list[f64]    ┆ list[f64]          │
╞════════╪════════╪════════════╪════════════╪══════════════╪══════════════╪════════════════════╡
│ Group1 ┆ 100.0  ┆ [0.5, 0.5, ┆ [1.5, 1.5, ┆ [0.25, 0.25, ┆ [0.25, 0.25, ┆ [0.25, 0.25, 0.64] │
│        ┆        ┆ 0.8]       ┆ 2.4]       ┆ 0.64]        ┆ 0.64]        ┆                    │
│ Group2 ┆ 200.0  ┆ [0.5, 0.5, ┆ [1.5, 1.5, ┆ [0.25, 0.25, ┆ [0.25, 0.25, ┆ [0.25, 0.25, 0.64] │
│        ┆        ┆ 0.8]       ┆ 2.4]       ┆ 0.64]        ┆ 0.64]        ┆                    │
│ Group3 ┆ 300.0  ┆ [0.7, 0.9] ┆ [2.1, 2.7] ┆ [0.49, 0.81] ┆ [0.49, 0.81] ┆ [0.49, 0.81]       │
└────────┴────────┴────────────┴────────────┴──────────────┴──────────────┴────────────────────┘

Unless I'm not understanding it correctly, it seems like you're unable to access columns outside of the list from within the eval function. Perhaps there might be a way to use list comprehension within the statement, but that doesn't really seem like a neat solution.

What would be the recommended approach here? Any help would be appreciated!


Solution

  • EDIT - Polars update:

    As of the latest version of Polars, this is now a the correct syntax:

    df = pl.DataFrame(
        {
            "Group": ["Group1", "Group2", "Group3"],
            "Weight": [100.0, 200.0, 300.0],
            "Vals": [[0.5, 0.5, 0.8],[0.5, 0.5, 0.8], [0.7, 0.9]]
        }
    )
    
    (df
        .explode('Vals')
        .with_columns(Weighted = pl.col('Weight')*pl.col('Vals'))
        .group_by('Group')
        .agg(
            pl.col('Weight').first(),                                                                                                             
            pl.col('Vals'),
            pl.col('Weighted')
            )                                                                                                 
    )
    
    shape: (3, 4)
    ┌────────┬────────┬─────────────────┬───────────────────────┐
    │ Group  ┆ Weight ┆ Vals            ┆ Weighted              │
    │ ---    ┆ ---    ┆ ---             ┆ ---                   │
    │ str    ┆ f64    ┆ list[f64]       ┆ list[f64]             │
    ╞════════╪════════╪═════════════════╪═══════════════════════╡
    │ Group3 ┆ 300.0  ┆ [0.7, 0.9]      ┆ [210.0, 270.0]        │
    │ Group1 ┆ 100.0  ┆ [0.5, 0.5, 0.8] ┆ [50.0, 50.0, 80.0]    │
    │ Group2 ┆ 200.0  ┆ [0.5, 0.5, 0.8] ┆ [100.0, 100.0, 160.0] │
    └────────┴────────┴─────────────────┴───────────────────────┘