Search code examples
pythondataframesortingformattingdata-manipulation

Get statistical data (median/mean/count/90%/10%) from multiple column combinations of a data frame


Have a dataframe like below with 3 main variables, first variable has 2 unique values (True/false), second variable has 3 unique values (1/2/3), and third has 3 unique values (A/B/C), combining for 18 possible combinations.

Index Var1 Var2 Var3 Value
0 TRUE 1 A 12
1 FALSE 2 B 93
2 TRUE 3 C 28
3 FALSE 1 A 23
4 TRUE 2 B 94
5 FALSE 3 C 12
6 TRUE 1 A 85
7 FALSE 2 B 23
8 TRUE 3 C 2

I want to cycle through these combinations and then print out a data frame similar to the one shown below for each combination (18 total dfs), filled with statistical data, which would most likely require a function but I'm not exactly sure the kind of logic needed for it.

(True/1/A)

Var1 Var2 Var3 Med Mean Count 90% quantile 10% quantile
TRUE 1 A 48.5 48.5 2 77.7 19.3

I tried grouping by which has worked well for combining everything, but didn't know med/mean/count/90% quantile and 10% quantiles well. I also tried querying to isolate each combination, but couldn't solve of the best way to iterate through the unique combos. Finally for quantiles I've tried .quantile().


Solution

  • Try:

    df = df.groupby(["Var1", "Var2", "Var3"]).agg(
        Med=("Value", "median"),
        Mean=("Value", "mean"),
        Count=("Value", "count"),
        q90=("Value", lambda x: x.quantile(q=0.9)),
        q10=("Value", lambda x: x.quantile(q=0.1)),
    )
    df = df.reindex(
        pd.MultiIndex.from_product(
            [[True, False], [1, 2, 3], ["A", "B", "C"]], names=["Var1", "Var2", "Var3"]
        )
    )
    
    # print 18 dataframes:
    for _, g in df.groupby(level=[0, 1, 2]):
        print(g)
        print("-" * 80)
    

    Prints:

    
    ...
    
                      Med  Mean  Count   q90   q10
    Var1  Var2 Var3                               
    False 3    C     12.0  12.0    1.0  12.0  12.0
    --------------------------------------------------------------------------------
                     Med  Mean  Count   q90   q10
    Var1 Var2 Var3                               
    True 1    A     48.5  48.5    2.0  77.7  19.3
    --------------------------------------------------------------------------------
                    Med  Mean  Count  q90  q10
    Var1 Var2 Var3                            
    True 1    B     NaN   NaN    NaN  NaN  NaN
    
    ...