Search code examples
dataframepython-polars

Horizontal reduction operation on Array columns


An example of horizontal operation is pl.any_horizontal which can be apply across the pl.Boolean columns, i.e.

import polars as pl

df = pl.DataFrame(
    {
        "a": pl.Series(
            "a", 
            [False, False],
            dtype=pl.Boolean,
        ),
        "b": pl.Series(
            "b", 
            [True, False],
            dtype=pl.Boolean,
        ),
    }
)

df.with_columns(pl.any_horizontal("a", "b"))
shape: (2, 3)
┌───────┬───────┬───────┐
│ a     ┆ b     ┆ any   │
│ ---   ┆ ---   ┆ ---   │
│ bool  ┆ bool  ┆ bool  │
╞═══════╪═══════╪═══════╡
│ false ┆ true  ┆ true  │
│ false ┆ false ┆ false │
└───────┴───────┴───────┘

Is there a way to apply the same operation across columns containing pl.Array data types? That is

df = pl.DataFrame(
    {
        "a": pl.Series(
            "a", 
            [[False, False], [False, True]],
            dtype=pl.Array(width=2, inner=pl.Boolean),
        ),
        "b": pl.Series(
            "b", 
            [[True, False], [True, True]],
            dtype=pl.Array(width=2, inner=pl.Boolean),
        ),
    }
)

df.with_columns(pl.any_horizontal("a", "b"))
shape: (2, 3)
┌────────────────┬────────────────┬────────────────┐
│ a              ┆ b              ┆ any            ┆
│ ---            ┆ ---            ┆ ---            ┆
│ array[bool, 2] ┆ array[bool, 2] ┆ array[bool, 2] ┆
╞════════════════╪════════════════╪════════════════╡
│ [false, false] ┆ [true, false]  ┆ [true, false]  │
│ [false, true]  ┆ [true, true]   ┆ [true, true]   │
└────────────────┴────────────────┴────────────────┘
(This is the result I wanna get but an error is thrown)

In theory this should be a well-defined operation if performed on column with the same width of the pl.Array (for example sum_horizontal could be also defined). In the case of binary array another option is to work with binary string. Any suggestion?


Solution

  • explode/implode over a row indicator seems to work. These are copy-free operations btw.

    (    df
        .with_columns(
            any=(pl.any_horizontal(pl.col("a").explode(), pl.col("b").explode())
                        .implode()
                        .over(pl.int_range(0,pl.count()))
                ).cast(pl.Array(width=2, inner=pl.Boolean))
            )
        )
    
    shape: (2, 3)
    ┌────────────────┬────────────────┬────────────────┐
    │ a              ┆ b              ┆ any            │
    │ ---            ┆ ---            ┆ ---            │
    │ array[bool, 2] ┆ array[bool, 2] ┆ array[bool, 2] │
    ╞════════════════╪════════════════╪════════════════╡
    │ [false, false] ┆ [true, false]  ┆ [true, false]  │
    │ [false, true]  ┆ [true, true]   ┆ [true, true]   │
    └────────────────┴────────────────┴────────────────┘