Search code examples
python-polars

Polars pl.col(field).name.map_fields applies to all struct columns (not the one specified)


I have this code:

import polars as pl

cols = ['Delta', 'Qty']

metrics = {'CHECK.US': {'Delta': {'ABC': 1, 'DEF': 2}, 'Qty': {'GHIJ': 3, 'TT': 4}},
           'CHECK.NA': {},
           'CHECK.FR': {'Delta': {'QQQ': 7, 'ABC': 6}, 'Qty': {'SS': 9, 'TT': 5}}
          }

df = pl.DataFrame([{col: v.get(col) for col in cols} for v in metrics.values()])\
       .insert_column(0, pl.Series('key', metrics.keys()))\
       .with_columns([pl.col(col).name.map_fields(lambda x: f'{col} ({x})') for col in cols])

Now, df.unnest('Qty') correctly gives all columns formatted as Qty (xxx):

shape: (3, 5)
┌──────────┬────────────┬────────────┬──────────┬──────────┐
│ key      ┆ Delta      ┆ Qty (GHIJ) ┆ Qty (TT) ┆ Qty (SS) │
│ ---      ┆ ---        ┆ ---        ┆ ---      ┆ ---      │
│ str      ┆ struct[3]  ┆ i64        ┆ i64      ┆ i64      │
╞══════════╪════════════╪════════════╪══════════╪══════════╡
│ CHECK.US ┆ {1,2,null} ┆ 3          ┆ 4        ┆ null     │
│ CHECK.NA ┆ null       ┆ null       ┆ null     ┆ null     │
│ CHECK.FR ┆ {6,null,7} ┆ null       ┆ 5        ┆ 9        │
└──────────┴────────────┴────────────┴──────────┴──────────┘

However, when I do the same thing for df.unnest('Delta') it incorrectly returns columns with Qty (xxx):

shape: (3, 5)
┌──────────┬───────────┬───────────┬───────────┬────────────┐
│ key      ┆ Qty (ABC) ┆ Qty (DEF) ┆ Qty (QQQ) ┆ Qty        │
│ ---      ┆ ---       ┆ ---       ┆ ---       ┆ ---        │
│ str      ┆ i64       ┆ i64       ┆ i64       ┆ struct[3]  │
╞══════════╪═══════════╪═══════════╪═══════════╪════════════╡
│ CHECK.US ┆ 1         ┆ 2         ┆ null      ┆ {3,4,null} │
│ CHECK.NA ┆ null      ┆ null      ┆ null      ┆ null       │
│ CHECK.FR ┆ 6         ┆ null      ┆ 7         ┆ {null,5,9} │
└──────────┴───────────┴───────────┴───────────┴────────────┘

The values look correct, just the column names are wrong.

Am I using pl.col(col).name.map_field(...) incorrectly? How can I fix my code so that the output becomes this:

shape: (3, 5)
┌──────────┬─────────────┬─────────────┬─────────────┬────────────┐
│ key      ┆ Delta (ABC) ┆ Delta (DEF) ┆ Delta (QQQ) ┆ Qty        │
│ ---      ┆ ---         ┆ ---         ┆ ---         ┆ ---        │
│ str      ┆ i64         ┆ i64         ┆ i64         ┆ struct[3]  │
╞══════════╪═════════════╪═════════════╪═════════════╪════════════╡

?


Solution

  • It's more of a general Python "Gotcha" with regards to lambdas inside loops.

    funcs = []
    
    for col in "one", "two":
        funcs.append(lambda: print(f"{col=}"))
        
    for func in funcs: func()
    
    col='two'
    col='two'
    

    col is being set to the last value in the loop for every lambda.

    The workaround is to use named params.

    funcs = []
    
    for col in "one", "two":
        funcs.append(lambda col=col: print(f"{col=}"))
        
    for func in funcs: func()
    
    col='one'
    col='two'
    

    In your specific example:

    pl.col(col).name.map_fields(lambda x, col=col: f'{col} ({x})') for col in cols  
    #                                     ^^^^^^^   
    

    1. https://docs.python.org/3/faq/programming.html#why-do-lambdas-defined-in-a-loop-with-different-values-all-return-the-same-result