Search code examples
pythondataframepython-polars

Mark rows of one dataframe based on values from another dataframe


I have following problem. Let's say I have two dataframes

df1 = pl.DataFrame({'a': range(10)})
df2 = pl.DataFrame({'b': [[1, 3], [5,6], [8, 9]], 'tags': ['aa', 'bb', 'cc']})
print(df1)
print(df2)
shape: (10, 1)
┌─────┐
│ a   │
│ --- │
│ i64 │
╞═════╡
│ 0   │
│ 1   │
│ 2   │
│ 3   │
│ 4   │
│ 5   │
│ 6   │
│ 7   │
│ 8   │
│ 9   │
└─────┘
shape: (3, 2)
┌───────────┬──────┐
│ b         ┆ tags │
│ ---       ┆ ---  │
│ list[i64] ┆ str  │
╞═══════════╪══════╡
│ [1, 3]    ┆ aa   │
│ [5, 6]    ┆ bb   │
│ [8, 9]    ┆ cc   │
└───────────┴──────┘

I need to mark/tag rows in dataframe df1 based on values of dataframe df2, so I can get following dataframe

print(pl.DataFrame({'a': range(10), 'tag': ['NA', 'aa', 'aa', 'aa', 'NA', 'bb', 'bb', 'NA', 'cc', 'cc']}))
shape: (10, 2)
┌─────┬─────┐
│ a   ┆ tag │
│ --- ┆ --- │
│ i64 ┆ str │
╞═════╪═════╡
│ 0   ┆ NA  │
│ 1   ┆ aa  │
│ 2   ┆ aa  │
│ 3   ┆ aa  │
│ 4   ┆ NA  │
│ 5   ┆ bb  │
│ 6   ┆ bb  │
│ 7   ┆ NA  │
│ 8   ┆ cc  │
│ 9   ┆ cc  │
└─────┴─────┘

So list in column b of df2 indicates start and end values for column a of df1 that needs to be tagged with what's in column tags.

Thanks


Solution

  • You could create the ranges and "flatten" the frame:

    (df2
      .with_columns(pl.int_ranges(pl.col("b").list.first(), pl.col("b").list.last() + 1))
      .explode("b")
    )
    
    shape: (7, 2)
    ┌─────┬──────┐
    │ b   ┆ tags │
    │ --- ┆ ---  │
    │ i64 ┆ str  │
    ╞═════╪══════╡
    │ 1   ┆ aa   │
    │ 2   ┆ aa   │
    │ 3   ┆ aa   │
    │ 5   ┆ bb   │
    │ 6   ┆ bb   │
    │ 8   ┆ cc   │
    │ 9   ┆ cc   │
    └─────┴──────┘
    

    Matching could then be done with a LEFT JOIN

    df1.join(
       df2.with_columns(
          pl.int_ranges(pl.col("b").list.first(), pl.col("b").list.last() + 1)).explode("b"),
       left_on = "a",
       right_on = "b",
       how = "left"
    )
    
    shape: (10, 2)
    ┌─────┬──────┐
    │ a   ┆ tags │
    │ --- ┆ ---  │
    │ i64 ┆ str  │
    ╞═════╪══════╡
    │ 0   ┆ null │
    │ 1   ┆ aa   │
    │ 2   ┆ aa   │
    │ 3   ┆ aa   │
    │ 4   ┆ null │
    │ 5   ┆ bb   │
    │ 6   ┆ bb   │
    │ 7   ┆ null │
    │ 8   ┆ cc   │
    │ 9   ┆ cc   │
    └─────┴──────┘