Consider the following dataframe.
df = pl.DataFrame(data={"col1": range(10)})
┌──────┐
│ col1 │
│ --- │
│ i64 │
╞══════╡
│ 0 │
│ 1 │
│ 2 │
│ 3 │
│ 4 │
│ 5 │
│ 6 │
│ 7 │
│ 8 │
│ 9 │
└──────┘
Let's say I have a list of tuples, where the first value represents the start index and the second value a length value (as used in pl.DataFrame.slice
). This might look like this:
slices = [(1,2), (5,3)]
Now, what's a good way to slice/extract two chunks out of df
, whereby the first slice starts in row 1 and has a length of 2, while the second chunk starts at row 5 and has a length of 3.
Here's what I am looking for:
┌──────┐
│ col1 │
│ --- │
│ i64 │
╞══════╡
│ 1 │
│ 2 │
│ 5 │
│ 6 │
│ 7 │
└──────┘
You could use pl.DataFrame.slice
to obtain each slice separately and then use pl.concat
to concatenate all slices.
pl.concat(df.slice(*slice) for slice in slices)
shape: (5, 1)
┌──────┐
│ col1 │
│ --- │
│ i64 │
╞══════╡
│ 1 │
│ 2 │
│ 5 │
│ 6 │
│ 7 │
└──────┘
Edit. As an attempt for a vectorized approach, you could first use the list of slice parameters to create a dataframe of indices (using pl.int_ranges
and pl.DataFrame.explode
). Afterwards, this dataframe of indices can be used to slice the df
with join.
indices = (
pl.DataFrame(slices, orient="row", schema=["offset", "length"])
.select(
index=pl.int_ranges("offset", pl.col("offset") + pl.col("length"))
)
.explode("index")
)
shape: (5, 1)
┌───────┐
│ index │
│ --- │
│ i64 │
╞═══════╡
│ 1 │
│ 2 │
│ 5 │
│ 6 │
│ 7 │
└───────┘
(
indices
.join(
df,
left_on="index",
right_on=pl.int_range(pl.len()),
how="left",
coalesce=True,
)
.drop("index")
)
shape: (5, 1)
┌──────┐
│ col1 │
│ --- │
│ i64 │
╞══════╡
│ 1 │
│ 2 │
│ 5 │
│ 6 │
│ 7 │
└──────┘