Hy!
I need to unstack some columns in polars, but i want to group the columns by column 'a'
seq_ = df['a'].unique()
l = len(seq_)
dfs = []
for idx, a in enumerate(seq_):
df_tmp = df.filter(pl.col('a') == a)
steps = df_tmp.shape[0]
df_tmp = df_tmp.unstack(step=steps, how="horizontal",columns=['b', 'c'])
df_tmp = df_tmp.with_columns(pl.lit(a).alias('a'))
dfs.append(df_tmp)
df_tmp = pl.concat(dfs, how='diagonal')
this code is too much slower. Is there any way to do this in polars more faster?
from this:
df = pl.from_repr("""
┌─────┬─────┬─────┐
│ a ┆ b ┆ c │
│ --- ┆ --- ┆ --- │
│ i64 ┆ str ┆ str │
╞═════╪═════╪═════╡
│ 1 ┆ a ┆ f │
│ 2 ┆ b ┆ g │
│ 1 ┆ c ┆ h │
│ 1 ┆ d ┆ i │
│ 2 ┆ e ┆ j │
└─────┴─────┴─────┘
""")
to this:
┌─────┬─────┬─────┬──────┬─────┬─────┬──────┐
│ a ┆ b_0 ┆ b_1 ┆ b_2 ┆ c_0 ┆ c_1 ┆ c_2 │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ str ┆ str ┆ str ┆ str ┆ str ┆ str │
╞═════╪═════╪═════╪══════╪═════╪═════╪══════╡
│ 1 ┆ a ┆ c ┆ d ┆ f ┆ h ┆ i │
│ 2 ┆ b ┆ e ┆ null ┆ g ┆ j ┆ null │
└─────┴─────┴─────┴──────┴─────┴─────┴──────┘
group_by
with all
aggregation will transform the b
and c
columns into lists.list.to_struct()
, you can name each element with a custom field name, and it will be padded out with null
s at this time per row. (The reason there's df = ...
steps below instead of all in one step is due to the way fields
works when passing in a late-binding lambda, run into duplicate field issues)unnest
makes each field
of a struct
column its own column.df = df.group_by('a', maintain_order=True).all()
for col in 'bc':
df = df.with_columns(
pl.col(col).list.to_struct(
n_field_strategy='max_width', fields=lambda i: f'{col}_{i}'
)
).unnest(col)
shape: (2, 7)
┌─────┬─────┬─────┬──────┬─────┬─────┬──────┐
│ a ┆ b_0 ┆ b_1 ┆ b_2 ┆ c_0 ┆ c_1 ┆ c_2 │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ str ┆ str ┆ str ┆ str ┆ str ┆ str │
╞═════╪═════╪═════╪══════╪═════╪═════╪══════╡
│ 1 ┆ a ┆ c ┆ d ┆ f ┆ h ┆ i │
│ 2 ┆ b ┆ e ┆ null ┆ g ┆ j ┆ null │
└─────┴─────┴─────┴──────┴─────┴─────┴──────┘
There might be an alternate, cleaner solution with pivot
, but at a glance I don't think this quite fits the mold of a pivot
.