I have a df:
import polars as pl
df = pl.DataFrame({
"A": [0],
"B": [1],
'{"C_C", "1"}': [2],
'{"D_D", "6"}': [3],
})
I want to change the column names so that if they have quotation marks they are joined with an underscore and _count
is added at end, so {"C_C", "1"}
becomes C_C_1_count
. I have tried:
def flatten_pivot_polars(d:pl.DataFrame, col_str: str)->pl.DataFrame:
import re
d=d.select(
pl.exclude(["Step", "RunId"]).name.map(lambda col_name:
'_'.join([re.findall('"([^"]*)"',col_name), col_str]))
)
return d
flatten_pivot_polars(df, 'count')
but this gives:
ComputeError: Python function in 'name.map' produced an error:
TypeError: sequence item 0: expected str instance, list found.
I am guessing it is because I am not excluding the non quoted columns properly but don't know what else to do.
re.findall
returns all non-overlapping matches, as a list of strings. You want to append col_str
to this list. For this, you can use the following.
re.findall('"([^"]*)"',col_name) + [col_str]
Instead of
[re.findall('"([^"]*)"',col_name), col_str]
which would end up with a nested list [[matches], col_str]
.