I have following problem. Let's say I have two dataframes
df1 = pl.DataFrame({'a': range(10)})
df2 = pl.DataFrame({'b': [[1, 3], [5,6], [8, 9]], 'tags': ['aa', 'bb', 'cc']})
print(df1)
print(df2)
shape: (10, 1)
┌─────┐
│ a │
│ --- │
│ i64 │
╞═════╡
│ 0 │
│ 1 │
│ 2 │
│ 3 │
│ 4 │
│ 5 │
│ 6 │
│ 7 │
│ 8 │
│ 9 │
└─────┘
shape: (3, 2)
┌───────────┬──────┐
│ b ┆ tags │
│ --- ┆ --- │
│ list[i64] ┆ str │
╞═══════════╪══════╡
│ [1, 3] ┆ aa │
│ [5, 6] ┆ bb │
│ [8, 9] ┆ cc │
└───────────┴──────┘
I need to mark/tag rows in dataframe df1
based on values of dataframe df2
, so I can get following dataframe
print(pl.DataFrame({'a': range(10), 'tag': ['NA', 'aa', 'aa', 'aa', 'NA', 'bb', 'bb', 'NA', 'cc', 'cc']}))
shape: (10, 2)
┌─────┬─────┐
│ a ┆ tag │
│ --- ┆ --- │
│ i64 ┆ str │
╞═════╪═════╡
│ 0 ┆ NA │
│ 1 ┆ aa │
│ 2 ┆ aa │
│ 3 ┆ aa │
│ 4 ┆ NA │
│ 5 ┆ bb │
│ 6 ┆ bb │
│ 7 ┆ NA │
│ 8 ┆ cc │
│ 9 ┆ cc │
└─────┴─────┘
So list in column b
of df2
indicates start and end values for column a
of df1
that needs to be tagged with what's in column tags
.
Thanks
You could create the ranges and "flatten" the frame:
(df2
.with_columns(pl.int_ranges(pl.col("b").list.first(), pl.col("b").list.last() + 1))
.explode("b")
)
shape: (7, 2)
┌─────┬──────┐
│ b ┆ tags │
│ --- ┆ --- │
│ i64 ┆ str │
╞═════╪══════╡
│ 1 ┆ aa │
│ 2 ┆ aa │
│ 3 ┆ aa │
│ 5 ┆ bb │
│ 6 ┆ bb │
│ 8 ┆ cc │
│ 9 ┆ cc │
└─────┴──────┘
Matching could then be done with a LEFT JOIN
df1.join(
df2.with_columns(
pl.int_ranges(pl.col("b").list.first(), pl.col("b").list.last() + 1)).explode("b"),
left_on = "a",
right_on = "b",
how = "left"
)
shape: (10, 2)
┌─────┬──────┐
│ a ┆ tags │
│ --- ┆ --- │
│ i64 ┆ str │
╞═════╪══════╡
│ 0 ┆ null │
│ 1 ┆ aa │
│ 2 ┆ aa │
│ 3 ┆ aa │
│ 4 ┆ null │
│ 5 ┆ bb │
│ 6 ┆ bb │
│ 7 ┆ null │
│ 8 ┆ cc │
│ 9 ┆ cc │
└─────┴──────┘