import polars as pl
df = pl.DataFrame({
'A': range(1,4),
'B': range(1,4),
'C': range(1,4),
'D': range(1,4)
})
print(df)
shape: (3, 4)
┌─────┬─────┬─────┬─────┐
│ A ┆ B ┆ C ┆ D │
│ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ i64 ┆ i64 │
╞═════╪═════╪═════╪═════╡
│ 1 ┆ 1 ┆ 1 ┆ 1 │
│ 2 ┆ 2 ┆ 2 ┆ 2 │
│ 3 ┆ 3 ┆ 3 ┆ 3 │
└─────┴─────┴─────┴─────┘
df_long = df.unpivot(
variable_name="recipe",
value_name="revenue")
print(df_long)
shape: (12, 2)
┌────────┬─────────┐
│ recipe ┆ revenue │
│ --- ┆ --- │
│ str ┆ i64 │
╞════════╪═════════╡
│ A ┆ 1 │
│ A ┆ 2 │
│ A ┆ 3 │
│ B ┆ 1 │
│ B ┆ 2 │
│ … ┆ … │
│ C ┆ 2 │
│ C ┆ 3 │
│ D ┆ 1 │
│ D ┆ 2 │
│ D ┆ 3 │
└────────┴─────────┘
It seems I need to add an index in order to pivot df_long
back into the original form of df
? Is there no way to pivot a polars dataframe without adding an index?
df_long = df_long.with_columns(index=pl.col("revenue").cum_count().over("recipe"))
df_long.pivot(
on='recipe',
index='index',
values='revenue',
aggregate_function='first'
)
shape: (3, 5)
┌───────┬─────┬─────┬─────┬─────┐
│ index ┆ A ┆ B ┆ C ┆ D │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ u32 ┆ i64 ┆ i64 ┆ i64 ┆ i64 │
╞═══════╪═════╪═════╪═════╪═════╡
│ 1 ┆ 1 ┆ 1 ┆ 1 ┆ 1 │
│ 2 ┆ 2 ┆ 2 ┆ 2 ┆ 2 │
│ 3 ┆ 3 ┆ 3 ┆ 3 ┆ 3 │
└───────┴─────┴─────┴─────┴─────┘
In R, I can perform the equivalent to unpivot and pivot without indexing, and was seeking the same functionality in Python.
df_pandas = df.to_pandas()
library(tidyverse)
library(reticulate)
df_long <-
py$df_pandas |>
pivot_longer(
everything(),
names_to = 'recipe',
values_to = 'value'
)
df_long |>
pivot_wider(
names_from='recipe',
values_from='value'
) |>
unnest(cols = c(A,B,C,D))
It seems I need to add an index in order to pivot
df_long
back into the original form ofdf
?
How would it otherwise decide that the entries in df_long
with revenue=1
are all on the same row, and not for recipe=A
the revenue is 2, for recipe=B
, the revenue is 3, etc? You are inferring that from how you defined df
, but from just df_long
as defined in your question, that is not possible.
If this is a requirement, than I suggest you add the index to df
before using unpivot
, so that df_long
carries that information:
>>> df_long=df.with_row_index().unpivot(index=["index"], variable_name="recipe",value_name="revenue")
>>> df_long
shape: (12, 3)
┌───────┬────────┬─────────┐
│ index ┆ recipe ┆ revenue │
│ --- ┆ --- ┆ --- │
│ u32 ┆ str ┆ i64 │
╞═══════╪════════╪═════════╡
│ 0 ┆ A ┆ 1 │
│ 1 ┆ A ┆ 2 │
│ 2 ┆ A ┆ 3 │
│ 0 ┆ B ┆ 1 │
│ 1 ┆ B ┆ 2 │
│ … ┆ … ┆ … │
│ 1 ┆ C ┆ 2 │
│ 2 ┆ C ┆ 3 │
│ 0 ┆ D ┆ 1 │
│ 1 ┆ D ┆ 2 │
│ 2 ┆ D ┆ 3 │
└───────┴────────┴─────────┘
The pivot can then use the index
column to reconstruct the original.