Search code examples
pythonpython-polars

python-polars Join Column Values into a concatenated string


I am trying to write an aggregation routine where values in columns are concatenated based on a group_by statement.

I am trying to call a custom function to do the aggregation, and also trying to avoid using lambda (my understanding is – lambda functions only run in serial, hence performance would be slower). Here is my code:

def agg_ll_field(col_name) -> pl.Expr:
        return ';'.join(pl.col(col_name).drop_nulls().unique().sort())
   
dfa = df.lazy()\
    .group_by('SharedSourceSystem', 'FOPortfolioName').agg(
        agg_ll_field('BookingUnits').alias('BOOKG_UNIT')
    ).collect()

I keep on getting an error:

agg_ll_field: Unexpected:  can only join an iterable   <class 'TypeError'>

Would anyone be able to help resolve this?

I tried using the map_groups function instead - that seems to work but I'm trying to avoid map_groups, since performance is supposed to be worse.


Solution

  • Here is the full example using str.join:

    import polars as pl
    # Create a sample DataFrame
    data = {
        'SharedSourceSystem': ['A', 'A', 'B', 'B', 'B'],
        'FOPortfolioName': ['X', 'X', 'Y', 'Y', 'Y'],
        'BookingUnits': [1, 2, 2, 2, 3]
    }
    
    df = pl.DataFrame(data)
    
    # Define the custom aggregation function
    def agg_ll_field(col_name) -> pl.Expr:
        return pl.col(col_name).drop_nulls().unique().sort().str.join(';')
    
    # Apply the lazy groupby and aggregation
    dfa = (
        df.lazy()
          .group_by('SharedSourceSystem', 'FOPortfolioName')
          .agg(
              agg_ll_field('BookingUnits').alias('BOOKG_UNIT')
          )
          .collect()
    )
    
    # Output
    
    ┌────────────────────┬─────────────────┬────────────┐
    │ SharedSourceSystem ┆ FOPortfolioName ┆ BOOKG_UNIT │
    │ ---                ┆ ---             ┆ ---        │
    │ str                ┆ str             ┆ str        │
    ╞════════════════════╪═════════════════╪════════════╡
    │ A                  ┆ X               ┆ 1;2        │
    │ B                  ┆ Y               ┆ 2;3        │
    └────────────────────┴─────────────────┴────────────┘