Search code examples
pythonpython-polars

After I unpivot a polars dataframe, how can I pivot it back to its original form without adding an index?


import polars as pl

df = pl.DataFrame({
    'A': range(1,4),
    'B': range(1,4),
    'C': range(1,4),
    'D': range(1,4)
})

print(df)
shape: (3, 4)
┌─────┬─────┬─────┬─────┐
│ A   ┆ B   ┆ C   ┆ D   │
│ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ i64 ┆ i64 │
╞═════╪═════╪═════╪═════╡
│ 1   ┆ 1   ┆ 1   ┆ 1   │
│ 2   ┆ 2   ┆ 2   ┆ 2   │
│ 3   ┆ 3   ┆ 3   ┆ 3   │
└─────┴─────┴─────┴─────┘
df_long = df.unpivot(
    variable_name="recipe",
    value_name="revenue")

print(df_long)
shape: (12, 2)
┌────────┬─────────┐
│ recipe ┆ revenue │
│ ---    ┆ ---     │
│ str    ┆ i64     │
╞════════╪═════════╡
│ A      ┆ 1       │
│ A      ┆ 2       │
│ A      ┆ 3       │
│ B      ┆ 1       │
│ B      ┆ 2       │
│ …      ┆ …       │
│ C      ┆ 2       │
│ C      ┆ 3       │
│ D      ┆ 1       │
│ D      ┆ 2       │
│ D      ┆ 3       │
└────────┴─────────┘

It seems I need to add an index in order to pivot df_long back into the original form of df? Is there no way to pivot a polars dataframe without adding an index?

df_long = df_long.with_columns(index=pl.col("revenue").cum_count().over("recipe"))

df_long.pivot(
    on='recipe',
    index='index',
    values='revenue',
    aggregate_function='first'
)
shape: (3, 5)
┌───────┬─────┬─────┬─────┬─────┐
│ index ┆ A   ┆ B   ┆ C   ┆ D   │
│ ---   ┆ --- ┆ --- ┆ --- ┆ --- │
│ u32   ┆ i64 ┆ i64 ┆ i64 ┆ i64 │
╞═══════╪═════╪═════╪═════╪═════╡
│ 1     ┆ 1   ┆ 1   ┆ 1   ┆ 1   │
│ 2     ┆ 2   ┆ 2   ┆ 2   ┆ 2   │
│ 3     ┆ 3   ┆ 3   ┆ 3   ┆ 3   │
└───────┴─────┴─────┴─────┴─────┘

In R, I can perform the equivalent to melt and pivot without indexing, and was seeking the same functionality in Python.

df_pandas = df.to_pandas()
library(tidyverse)
library(reticulate)

df_long <-
    py$df_pandas |>
    pivot_longer(
        everything(),
        names_to = 'recipe',
        values_to = 'value'
    )

df_long |>
    pivot_wider(
        names_from='recipe',
        values_from='value'
    ) |>
    unnest(cols = c(A,B,C,D))

Solution

  • It seems I need to add an index in order to pivot df_long back into the original form of df?

    How would it otherwise decide that the entries in df_long with revenue=1 are all on the same row, and not for recipe=A the revenue is 2, for recipe=B, the revenue is 3, etc? You are inferring that from how you defined df, but from just df_long as defined in your question, that is not possible.

    If this is a requirement, than I suggest you add the index to df before using unpivot, so that df_long carries that information:

    >>> df_long=df.with_row_index().unpivot(index=["index"], variable_name="recipe",value_name="revenue")   
    >>> df_long
    shape: (12, 3)
    ┌───────┬────────┬─────────┐
    │ index ┆ recipe ┆ revenue │
    │ ---   ┆ ---    ┆ ---     │
    │ u32   ┆ str    ┆ i64     │
    ╞═══════╪════════╪═════════╡
    │ 0     ┆ A      ┆ 1       │
    │ 1     ┆ A      ┆ 2       │
    │ 2     ┆ A      ┆ 3       │
    │ 0     ┆ B      ┆ 1       │
    │ 1     ┆ B      ┆ 2       │
    │ …     ┆ …      ┆ …       │
    │ 1     ┆ C      ┆ 2       │
    │ 2     ┆ C      ┆ 3       │
    │ 0     ┆ D      ┆ 1       │
    │ 1     ┆ D      ┆ 2       │
    │ 2     ┆ D      ┆ 3       │
    └───────┴────────┴─────────┘
    

    The pivot can then use the index column to reconstruct the original.