Search code examples
python-polars

How to sum multi columns by regex in Polars?


I have multi columns which name startswith "ts" like "ts_1, ts_2, ts_3,etc" , I want to sum these f64 value row by row, but I don't know exactly the column names. If I use regex like pl.col('^ts.*$'). How to sum these value?


Solution

  • You can use polars.sum_horizontal with a regex.

    import polars as pl
    
    df = pl.DataFrame({
        'ts_1': [1, 2, 3, 4],
        'a': [-100] * 4,
        'ts_2': [10] * 4,
        'b': [-1000] * 4,
        'ts_3': [100] * 4,
    })
    
    df.with_columns(
         pl.sum_horizontal('^ts_.*$').alias('ts_sum')
    )
    
    shape: (4, 6)
    ┌──────┬──────┬──────┬───────┬──────┬────────┐
    │ ts_1 ┆ a    ┆ ts_2 ┆ b     ┆ ts_3 ┆ ts_sum │
    │ ---  ┆ ---  ┆ ---  ┆ ---   ┆ ---  ┆ ---    │
    │ i64  ┆ i64  ┆ i64  ┆ i64   ┆ i64  ┆ i64    │
    ╞══════╪══════╪══════╪═══════╪══════╪════════╡
    │ 1    ┆ -100 ┆ 10   ┆ -1000 ┆ 100  ┆ 111    │
    │ 2    ┆ -100 ┆ 10   ┆ -1000 ┆ 100  ┆ 112    │
    │ 3    ┆ -100 ┆ 10   ┆ -1000 ┆ 100  ┆ 113    │
    │ 4    ┆ -100 ┆ 10   ┆ -1000 ┆ 100  ┆ 114    │
    └──────┴──────┴──────┴───────┴──────┴────────┘