Search code examples
pythonibis

Use Ibis to return aggregates of every numeric column in a table


I want to use Ibis to return the mean and standard deviation of each floating point numeric column in a table. How do I do that?


Solution

  • Here's a standalone example that uses Ibis to return the mean and standard deviation of each floating point column in the iris flower dataset. This uses Ibis selectors:

    import ibis
    from ibis import _
    import ibis.selectors as s
    import ibis.expr.datatypes as dt
    import seaborn as sns
    
    iris = ibis.memtable(sns.load_dataset("iris"))
    
    iris.aggregate(
        s.across(
            s.of_type(dt.float()),
            dict(mean=ibis._.mean(), std=ibis._.std())
        )
    )
    

    This returns:

    ┏━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━┓
    ┃ sepal_length_mean ┃ sepal_width_mean ┃ petal_length_mean ┃ petal_width_mean ┃ sepal_length_std ┃ sepal_width_std ┃ petal_length_std ┃ petal_width_std ┃
    ┡━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━┩
    │ float64           │ float64          │ float64           │ float64          │ float64          │ float64         │ float64          │ float64         │
    ├───────────────────┼──────────────────┼───────────────────┼──────────────────┼──────────────────┼─────────────────┼──────────────────┼─────────────────┤
    │          5.843333 │         3.057333 │             3.758 │         1.199333 │         0.828066 │        0.435866 │         1.765298 │        0.762238 │
    └───────────────────┴──────────────────┴───────────────────┴──────────────────┴──────────────────┴─────────────────┴──────────────────┴─────────────────┘
    

    You can replace s.of_type(dt.float()) with s.numeric() to apply this to all numeric columns including integers. Or you can use other members of dt to refer to other column types.