Search code examples
dataframepython-polarsrust-polars

How to aggregate over all rows in a Polars dataframe?


In a Polars dataframe, I know that I can aggregate over a group of rows that have the same value in a column using for example .groupby("first_name").agg([...]).

How can I aggregate over all rows in a dataframe?

For example, I'd like to get the mean of all values in a column.


Solution

  • As suggested by @jqurious, you can use mean() to obtain the mean, without adding an aggregation.

    Examples.

    import polars as pl
    
    # sample dataframe
    df = pl.DataFrame({
        'text':['a','a','b','b'],
        'value':[1,2,3,4]
    })
    
    shape: (4, 2)
    ┌──────┬───────┐
    │ text ┆ value │
    │ ---  ┆ ---   │
    │ str  ┆ i64   │
    ╞══════╪═══════╡
    │ a    ┆ 1     │
    │ a    ┆ 2     │
    │ b    ┆ 3     │
    │ b    ┆ 4     │
    └──────┴───────┘
    
    # add the mean with select
    df.select(
        value_mean = pl.mean('value')
    )
    
    shape: (1, 1)
    ┌────────────┐
    │ value_mean │
    │ ---        │
    │ f64        │
    ╞════════════╡
    │ 2.5        │
    └────────────┘
    
    # add the mean with with_columns
    
    df.with_columns(
        value_mean = pl.mean('value')
    )
    
    shape: (4, 3)
    ┌──────┬───────┬────────────┐
    │ text ┆ value ┆ value_mean │
    │ ---  ┆ ---   ┆ ---        │
    │ str  ┆ i64   ┆ f64        │
    ╞══════╪═══════╪════════════╡
    │ a    ┆ 1     ┆ 2.5        │
    │ a    ┆ 2     ┆ 2.5        │
    │ b    ┆ 3     ┆ 2.5        │
    │ b    ┆ 4     ┆ 2.5        │
    └──────┴───────┴────────────┘
    
    

    Using select, only the columns specified in select will show up in the result. Using with_columns, all columns will show up in the result plus any column you add or modify.

    For that, the result of select is one row while the result of with_columns is the 4 rows of the sample dataframe.