Search code examples

python-polars Join Column Values into a concatenated string

I am trying to write an aggregation routine where values in columns are concatenated based on a group_by statement.

I am trying to call a custom function to do the aggregation, and also trying to avoid using lambda (my understanding is – lambda functions only run in serial, hence performance would be slower). Here is my code:

def agg_ll_field(col_name) -> pl.Expr:
        return ';'.join(pl.col(col_name).drop_nulls().unique().sort())
dfa = df.lazy()\
    .group_by('SharedSourceSystem', 'FOPortfolioName').agg(

I keep on getting an error:

agg_ll_field: Unexpected:  can only join an iterable   <class 'TypeError'>

Would anyone be able to help resolve this?

I tried using the map_groups function instead - that seems to work but I'm trying to avoid map_groups, since performance is supposed to be worse.


  • Here is the full example using str.join:

    import polars as pl
    # Create a sample DataFrame
    data = {
        'SharedSourceSystem': ['A', 'A', 'B', 'B', 'B'],
        'FOPortfolioName': ['X', 'X', 'Y', 'Y', 'Y'],
        'BookingUnits': [1, 2, 2, 2, 3]
    df = pl.DataFrame(data)
    # Define the custom aggregation function
    def agg_ll_field(col_name) -> pl.Expr:
        return pl.col(col_name).drop_nulls().unique().sort().str.join(';')
    # Apply the lazy groupby and aggregation
    dfa = (
          .group_by('SharedSourceSystem', 'FOPortfolioName')
    # Output
    │ SharedSourceSystem ┆ FOPortfolioName ┆ BOOKG_UNIT │
    │ ---                ┆ ---             ┆ ---        │
    │ str                ┆ str             ┆ str        │
    │ A                  ┆ X               ┆ 1;2        │
    │ B                  ┆ Y               ┆ 2;3        │