Search code examples
pythondataframesortingformattingpivot-table

Get values of conditionals from different columns in dataframe (tried pivottable)


I want to essentially make a pivot table but have it sorted very specifically. For example I want it to calculate the mean when Var2=C|Var3=2 and Var4=0. Wasn't sure how to incorporate 'or' or 'and' intot he pivottable function. Put an example table below for the input and outputs:

Input:

Entry Data Var2 Var3 Var4 Var5
A 1 C 3 0 B
B 1 A 2 1 B
C 2 B 1 1 B
D 2 C 3 0 A
E 3 B 2 0 C
F 3 B 1 1 A
G 4 A 3 0 A
H 4 C 2 0 C

Output: Averages

Avg Var3=3 or Var3=2 Var2=B Var2=A or Var5=B
Var4=0 2.8 3 2.5
Var4=1 1 2.5 1.5

Medians

Med Var3=3 or Var3=2 Var2=B Var2=A or Var5=B
Var4=0 3 3 2.5
Var4=1 1 2.5 1.5

Tried this to sort specific values, but it doesn't really do the same thing that I want: Python Pandas Dataframe Pivot Table Column and Values Order Was looking more for the 'or' function to pivot on multiple conditionals.


Solution

  • You can try:

    def my_func(g):
        avg_1 = g.query("Var3 == 3 or Var3 == 2")["Data"].mean()
        avg_2 = g.query("Var2 == 'B'")["Data"].mean()
        avg_3 = g.query("Var2 == 'A' or Var5 == 'B'")["Data"].mean()
    
        med_1 = g.query("Var3 == 3 or Var3 == 2")["Data"].median()
        med_2 = g.query("Var2 == 'B'")["Data"].median()
        med_3 = g.query("Var2 == 'A' or Var5 == 'B'")["Data"].median()
    
        return pd.Series(
            [avg_1, avg_2, avg_3, med_1, med_2, med_3],
            index=["Avg1", "Avg2", "Avg3", "Med1", "Med2", "Med3"],
        )
    
    
    out = df.groupby("Var4").apply(my_func)
    print(out)
    

    Prints:

          Avg1  Avg2  Avg3  Med1  Med2  Med3
    Var4                                    
    0      2.8   3.0   2.5   3.0   3.0   2.5
    1      1.0   2.5   1.5   1.0   2.5   1.5