Search code examples
pythonpython-ggplotplotnine

How to create bar chart with geomean, mean, max and min from data using Plotnine


How to create bar chart with gmean, mean, max and min stats for each category. For the data below,

X B Y
A1 b1 4
A1 b2 2
A1 b3 3
A1 b4 8
A2 b1 7
A2 c1 10
A2 c2 8
A2 b3 7
A3 b4 10
A3 b5 9
A3 b1 4
A3 b3 1

The chart should look like, enter image description here


Solution

  • You need to prepare(calculate the aggregates) the data you want to visualise.

    import pandas as pd
    from plotnine import ggplot, aes, geom_col
    from scipy.stats import gmean
    from pandas.api.types import CategoricalDtype
    
    # Original Data
    df = pd.DataFrame({
        "X": sorted(("A1", "A2", "A3") * 4),
        "Y": [4, 2, 3, 8, 7, 10, 8, 7, 10, 9, 4, 1]
    })
    
    # Calculate the aggregates
    df2 = (df.groupby("X")
     .agg({"Y": [gmean, "mean", "max", "min"]})
     .unstack()
     .reset_index()
     .rename(columns={0: "value", "level_1": "agg"})
    )
    
    # Order the aggregates
    df2["agg"] = df2["agg"].astype(CategoricalDtype(["gmean", "mean", "max", "min"]))
    
    (ggplot(df2, aes("X", "value", fill="agg"))
     + geom_col(position="dodge")
    )
    

    enter image description here