Search code examples
pythonpython-polars

Count positive and negative values on the rows


I have a df as follows:

enter image description here

with n knows at runtime.

I need to count 1 and -1 values over the rows.

Namely, I need a new df (or new columns in the old one):

enter image description here Any advice?


Solution

  • You can use polars.sum_horizontal with an Expression to sum horizontally.

    For example, starting with this data

    import polars as pl
    
    data_frame = (
        pl.DataFrame({
            'col0': [1, -1, 1, -1, 1],
            'col1': [1, 1, 1, 1, 1],
            'col2': [-1, -1, -1, -1, -1],
            'col3': [1, -1, -1, 1, 1],
        })
    )
    data_frame
    
    shape: (5, 4)
    ┌──────┬──────┬──────┬──────┐
    │ col0 ┆ col1 ┆ col2 ┆ col3 │
    │ ---  ┆ ---  ┆ ---  ┆ ---  │
    │ i64  ┆ i64  ┆ i64  ┆ i64  │
    ╞══════╪══════╪══════╪══════╡
    │ 1    ┆ 1    ┆ -1   ┆ 1    │
    │ -1   ┆ 1    ┆ -1   ┆ -1   │
    │ 1    ┆ 1    ┆ -1   ┆ -1   │
    │ -1   ┆ 1    ┆ -1   ┆ 1    │
    │ 1    ┆ 1    ┆ -1   ┆ 1    │
    └──────┴──────┴──────┴──────┘
    

    We can sum all columns horizontally, using polars.all.

    (
        data_frame
        .with_columns(
            pl.sum_horizontal(pl.all() > 0).alias('pos'),
            pl.sum_horizontal(pl.all() < 0).alias('neg'),
        )
    )
    
    shape: (5, 6)
    ┌──────┬──────┬──────┬──────┬─────┬─────┐
    │ col0 ┆ col1 ┆ col2 ┆ col3 ┆ pos ┆ neg │
    │ ---  ┆ ---  ┆ ---  ┆ ---  ┆ --- ┆ --- │
    │ i64  ┆ i64  ┆ i64  ┆ i64  ┆ u32 ┆ u32 │
    ╞══════╪══════╪══════╪══════╪═════╪═════╡
    │ 1    ┆ 1    ┆ -1   ┆ 1    ┆ 3   ┆ 1   │
    │ -1   ┆ 1    ┆ -1   ┆ -1   ┆ 1   ┆ 3   │
    │ 1    ┆ 1    ┆ -1   ┆ -1   ┆ 2   ┆ 2   │
    │ -1   ┆ 1    ┆ -1   ┆ 1    ┆ 2   ┆ 2   │
    │ 1    ┆ 1    ┆ -1   ┆ 1    ┆ 3   ┆ 1   │
    └──────┴──────┴──────┴──────┴─────┴─────┘
    

    How it works

    The above algorithm works because Polars will upcast boolean values to integers when summing. For example, the expression pl.all() > 0 produces Expressions of type boolean.

    (
        data_frame
        .with_columns(
            pl.all() > 0
        )
    )
    
    shape: (5, 4)
    ┌───────┬──────┬───────┬───────┐
    │ col0  ┆ col1 ┆ col2  ┆ col3  │
    │ ---   ┆ ---  ┆ ---   ┆ ---   │
    │ bool  ┆ bool ┆ bool  ┆ bool  │
    ╞═══════╪══════╪═══════╪═══════╡
    │ true  ┆ true ┆ false ┆ true  │
    │ false ┆ true ┆ false ┆ false │
    │ true  ┆ true ┆ false ┆ false │
    │ false ┆ true ┆ false ┆ true  │
    │ true  ┆ true ┆ false ┆ true  │
    └───────┴──────┴───────┴───────┘
    

    polars.sum_horizontal will then convert these to integers as it sums them horizontally.

    For examples of how to select only certain columns (by name, by type, by regex expression, etc...), see this StackOverflow response.