I have some data structed as showed at the first picture. Where I like to restructure the dataframe.
Short piece of the initial data:
id | time | value |
---|---|---|
2050 | 02-01 | 20 |
2051 | 02-01 | 25 |
2050 | 02-02 | 21 |
2051 | 02-02 | 22 |
2051 | 02-03 | 23 |
The way I would like the restructured dataframe is with a timestamp column and then a column for each externallogid. I have done it with use of pandas, but since the file is quite huge, and must be used multiple times, I will like to do it in Polars due to the speed.
Excpected output:
time | 2050 | 2051 |
---|---|---|
02-01 | 20 | 25 |
02-02 | 21 | 22 |
02-03 | nan | 23 |
I have tried an use the groupby function, and join/hstack/concat. But seems to have problems when also trying to use Lazyframes.
Thanks
To produce the data:
import polars as pl
lf = pl.DataFrame({'id': [2050, 2051, 2050, 2051, 2051],
'time': ['2023-05-01',
'2023-05-01',
'2023-05-02',
'2023-05-02',
'2023-05-03'],
'value': [20, 25, 21, 22, 23]})
lf = lf.with_columns(pl.col("time").str.to_datetime("%Y-%m-%d"))
You should .pivot()
;
In [29]: lf.pivot('id', index='time')
Out[29]:
shape: (3, 3)
┌─────────────────────┬──────┬──────┐
│ time ┆ 2050 ┆ 2051 │
│ --- ┆ --- ┆ --- │
│ datetime[μs] ┆ i64 ┆ i64 │
╞═════════════════════╪══════╪══════╡
│ 2023-05-01 00:00:00 ┆ 20 ┆ 25 │
│ 2023-05-02 00:00:00 ┆ 21 ┆ 22 │
│ 2023-05-03 00:00:00 ┆ null ┆ 23 │
└─────────────────────┴──────┴──────┘