I have a polars dataframe with subject_id
, timestamp
, event
, col1
, and col2
columns.
I want to split this dataframe into two polars dataframe (one with subject_id
, timestamp
, event
and one with subject_id
, timestamp
, col1
, col2
), but create a column for a unique id before splitting such that I can use that id to join the split dataframes after grouping/manipulating separately.
How can I create this unique id column in polars where there is a unique id for every unique subject_id
, timestamp
pair in the dataframe before splitting?
Essentially, I wish to do what this post provided, but in Polars. I understand Polars does not have indexes, so what is the best approach?
Looks like I just had to do a bit more digging - it's helpful to try to find a solution in pandas first then try to replicate using polars. Answer from this post:
(
# Add row index.
df.with_row_index()
# Group on id and cat column.
.group_by(
["id", "cat"],
maintain_order=True,
)
.agg(
# Create a list of all index positions per group.
pl.col("index")
)
# Add a new row count for each group.
.with_row_index("ngroup")
# Expand index list column to separate rows.
.explode("index")
# Reorder columns.
.select("index", "ngroup", "id", "cat")
# Optionally sort by original order.
.sort("index")
)