Search code examples
pythondataframepython-polars

How to create new columns based on a grouping method for one column in Polars?


I have some data structed as showed at the first picture. Where I like to restructure the dataframe. initial data: Short piece of the initial data:

id time value
2050 02-01 20
2051 02-01 25
2050 02-02 21
2051 02-02 22
2051 02-03 23

The way I would like the restructured dataframe is with a timestamp column and then a column for each externallogid. I have done it with use of pandas, but since the file is quite huge, and must be used multiple times, I will like to do it in Polars due to the speed.

Excpected output:

time 2050 2051
02-01 20 25
02-02 21 22
02-03 nan 23

I have tried an use the groupby function, and join/hstack/concat. But seems to have problems when also trying to use Lazyframes.

Thanks

To produce the data:

import polars as pl

lf = pl.DataFrame({'id': [2050, 2051, 2050, 2051, 2051],
                    'time': ['2023-05-01',
                             '2023-05-01',
                             '2023-05-02',
                             '2023-05-02',
                             '2023-05-03'],
                   'value': [20, 25, 21, 22, 23]})
lf = lf.with_columns(pl.col("time").str.to_datetime("%Y-%m-%d"))

Solution

  • You should .pivot();

    In [29]: lf.pivot('id', index='time')
    Out[29]:
    shape: (3, 3)
    ┌─────────────────────┬──────┬──────┐
    │ time                ┆ 2050 ┆ 2051 │
    │ ---                 ┆ ---  ┆ ---  │
    │ datetime[μs]        ┆ i64  ┆ i64  │
    ╞═════════════════════╪══════╪══════╡
    │ 2023-05-01 00:00:00 ┆ 20   ┆ 25   │
    │ 2023-05-02 00:00:00 ┆ 21   ┆ 22   │
    │ 2023-05-03 00:00:00 ┆ null ┆ 23   │
    └─────────────────────┴──────┴──────┘