Search code examples
pythondataframepython-polars

Explode multiple columns with different lengths


I have a dataframe like:

data = {
    "a": [[1], [2], [3, 4], [5, 6, 7]],
    "b": [[], [8], [9, 10], [11, 12]],
}
df = pl.DataFrame(data)
"""
┌───────────┬───────────┐
│ a         ┆ b         │
│ ---       ┆ ---       │
│ list[i64] ┆ list[i64] │
╞═══════════╪═══════════╡
│ [1]       ┆ []        │
│ [2]       ┆ [8]       │
│ [3, 4]    ┆ [9, 10]   │
│ [5, 6, 7] ┆ [11, 12]  │
└───────────┴───────────┘
"""

Each pair of lists may not have the same length, and I want to "truncate" the explode to the shortest of both lists:

"""
┌─────┬─────┐
│ a   ┆ b   │
│ --- ┆ --- │
│ i64 ┆ i64 │
╞═════╪═════╡
│ 2   ┆ 8   │
│ 3   ┆ 9   │
│ 4   ┆ 10  │
│ 5   ┆ 11  │
│ 6   ┆ 12  │
└─────┴─────┘
"""

I was thinking that maybe I'd have to fill the shortest of both lists with None to match both lengths, and then drop_nulls. But I was wondering if there was a more direct approach to this?


Solution

  • Here's one approach:

    min_length = pl.min_horizontal(pl.col('a', 'b').list.len())
    
    out = (df.filter(min_length != 0)
           .with_columns(
               pl.col('a', 'b').list.head(min_length)
               )
           .explode('a', 'b')
           )
    

    Output:

    shape: (5, 2)
    ┌─────┬─────┐
    │ a   ┆ b   │
    │ --- ┆ --- │
    │ i64 ┆ i64 │
    ╞═════╪═════╡
    │ 2   ┆ 8   │
    │ 3   ┆ 9   │
    │ 4   ┆ 10  │
    │ 5   ┆ 11  │
    │ 6   ┆ 12  │
    └─────┴─────┘
    

    Explanation