How to select the column from a Polars Dataframe that has the largest sum?

I have Polars dataframe with a bunch of columns I need to find the column with, for example, the largest sum.

The below snippet sums all of the columns:

df = pl.DataFrame(
    {
        "a": [0, 1, 3, 4],
        "b": [0, 0, 0, 0],
        "c": [1, 0, 1, 0],
    }
)

max_col = df.select(pl.col(df.columns).sum())

shape: (1, 3)
┌─────┬─────┬─────┐
│ a   ┆ b   ┆ c   │
│ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ i64 │
╞═════╪═════╪═════╡
│ 8   ┆ 0   ┆ 2   │
└─────┴─────┴─────┘

But I'm missing the last step of selecting the column with the largest value?

Solution

I would do this as a unpivot/filter.

  df \
    .select(pl.all().sum()) \
    .unpivot() \
    .filter(pl.col('value')==pl.col('value').max())

If you want the original shape then a single chain is a bit tougher. I'd just do it like this instead.

allcalc=df \
    .select(pl.all().sum())
allcalc.select(allcalc.unpivot().filter(pl.col('value')==pl.col('value').max()) \
       .get_column('variable').to_list())

The above works if there is a tie, for instance if you have:

df=pl.DataFrame(
    {
        "a": [0, 1, 3, 4],
        "b": [0, 0, 0, 0],
        "c": [1, 0, 1, 6],
    }
)

then you'll get 'a' and 'c' in either case.