Search code examples
pythonpython-polars

Python Polars unable to convert f64 column to str and aggregate to list


Working with a dataframe of the type:

df = pl.DataFrame({
'order': [38681.0, 38692.0, 38680.0, 38693.0],
'shipto': ["471433", "239269", "471433","239269"],
'value': [10,20,30,None]
})

Need to Groupby the 'Shipto' column, sum the 'values' and aggregate the 'order' to a list. Have tried a few things but can't get it work.

The base script:

df = (df
     .with_columns([
         pl.col('order').cast(pl.Utf8), 
         pl.col(pl.Float64).fill_null(0)
     ])
   .groupby('shipto')
   .agg([
        pl.col('order').apply(lambda x: str(x)).alias('order_list'),
        pl.sum('value')
    ])
    )

Returns:

shipto order_list value
str str i64
471433 shape: (2,) 40
Series: '' [f64]
[
...
239269 shape: (2,) 20
Series: '' [f64]
[
...

What I am hoping to get in the 'order_list' column is either ([38681.0,38680.0],[38692.0,38693.0]) or (['38681.0','38680.0'],['38692.0','38693.0'])

I am guessing the 'order' column needs to be converted from f64 values to string (Utf8), but can't get that that to work.

Variations to the line 'pl.col('order').cast(pl.Utf8), #.cast(pl.Float64)' that I have tried so far:

pl.col('order').cast(pl.Float64).cast(pl.Utf8),

pl.col('order').cast(pl.Int64).cast(pl.Utf8),

pl.col('order').map(lambda x: str(x)),

pl.col('order').apply(lambda x: str(int(x))),

pl
  .when(pl.col('order').is_null())
  .then(pl.lit(''))
  .otherwise(pl.col('order').cast(pl.Float64).cast(pl.Utf8)).alias('order'),

Sure there is some basic error, but continuing to struggle with this and any help would be greatly appreciated.


Solution

  • if you do not specify an aggregation function inside of agg, the result will be a list.

    This might be what you are looking for.

    df.group_by('shipto').agg(
        'order',
        pl.sum('value')
    )
    
    # Result
    
    shape: (2, 3)
    ┌────────┬────────────────────┬───────┐
    │ shipto ┆ order              ┆ value │
    │ ---    ┆ ---                ┆ ---   │
    │ str    ┆ list[f64]          ┆ i64   │
    ╞════════╪════════════════════╪═══════╡
    │ 471433 ┆ [38681.0, 38680.0] ┆ 40    │
    │ 239269 ┆ [38692.0, 38693.0] ┆ 20    │
    └────────┴────────────────────┴───────┘