I have this code:
import polars as pl
cols = ['Delta', 'Qty']
metrics = {'CHECK.US': {'Delta': {'ABC': 1, 'DEF': 2}, 'Qty': {'GHIJ': 3, 'TT': 4}},
'CHECK.NA': {},
'CHECK.FR': {'Delta': {'QQQ': 7, 'ABC': 6}, 'Qty': {'SS': 9, 'TT': 5}}
}
df = pl.DataFrame([{col: v.get(col) for col in cols} for v in metrics.values()])\
.insert_column(0, pl.Series('key', metrics.keys()))\
.with_columns([pl.col(col).name.map_fields(lambda x: f'{col} ({x})') for col in cols])
Now, df.unnest('Qty')
correctly gives all columns formatted as Qty (xxx)
:
shape: (3, 5)
┌──────────┬────────────┬────────────┬──────────┬──────────┐
│ key ┆ Delta ┆ Qty (GHIJ) ┆ Qty (TT) ┆ Qty (SS) │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ str ┆ struct[3] ┆ i64 ┆ i64 ┆ i64 │
╞══════════╪════════════╪════════════╪══════════╪══════════╡
│ CHECK.US ┆ {1,2,null} ┆ 3 ┆ 4 ┆ null │
│ CHECK.NA ┆ null ┆ null ┆ null ┆ null │
│ CHECK.FR ┆ {6,null,7} ┆ null ┆ 5 ┆ 9 │
└──────────┴────────────┴────────────┴──────────┴──────────┘
However, when I do the same thing for df.unnest('Delta')
it incorrectly returns columns with Qty (xxx)
:
shape: (3, 5)
┌──────────┬───────────┬───────────┬───────────┬────────────┐
│ key ┆ Qty (ABC) ┆ Qty (DEF) ┆ Qty (QQQ) ┆ Qty │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ str ┆ i64 ┆ i64 ┆ i64 ┆ struct[3] │
╞══════════╪═══════════╪═══════════╪═══════════╪════════════╡
│ CHECK.US ┆ 1 ┆ 2 ┆ null ┆ {3,4,null} │
│ CHECK.NA ┆ null ┆ null ┆ null ┆ null │
│ CHECK.FR ┆ 6 ┆ null ┆ 7 ┆ {null,5,9} │
└──────────┴───────────┴───────────┴───────────┴────────────┘
The values look correct, just the column names are wrong.
Am I using pl.col(col).name.map_field(...)
incorrectly? How can I fix my code so that the output becomes this:
shape: (3, 5)
┌──────────┬─────────────┬─────────────┬─────────────┬────────────┐
│ key ┆ Delta (ABC) ┆ Delta (DEF) ┆ Delta (QQQ) ┆ Qty │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ str ┆ i64 ┆ i64 ┆ i64 ┆ struct[3] │
╞══════════╪═════════════╪═════════════╪═════════════╪════════════╡
?
It's more of a general Python "Gotcha" with regards to lambdas inside loops.
funcs = []
for col in "one", "two":
funcs.append(lambda: print(f"{col=}"))
for func in funcs: func()
col='two'
col='two'
col
is being set to the last value in the loop for every lambda.
The workaround is to use named params.
funcs = []
for col in "one", "two":
funcs.append(lambda col=col: print(f"{col=}"))
for func in funcs: func()
col='one'
col='two'
In your specific example:
pl.col(col).name.map_fields(lambda x, col=col: f'{col} ({x})') for col in cols
# ^^^^^^^