I want to essentially make a pivot table but have it sorted very specifically. For example I want it to calculate the mean when Var2=C|Var3=2 and Var4=0. Wasn't sure how to incorporate 'or' or 'and' intot he pivottable function. Put an example table below for the input and outputs:
Input:
Entry | Data | Var2 | Var3 | Var4 | Var5 |
---|---|---|---|---|---|
A | 1 | C | 3 | 0 | B |
B | 1 | A | 2 | 1 | B |
C | 2 | B | 1 | 1 | B |
D | 2 | C | 3 | 0 | A |
E | 3 | B | 2 | 0 | C |
F | 3 | B | 1 | 1 | A |
G | 4 | A | 3 | 0 | A |
H | 4 | C | 2 | 0 | C |
Output: Averages
Avg | Var3=3 or Var3=2 | Var2=B | Var2=A or Var5=B |
---|---|---|---|
Var4=0 | 2.8 | 3 | 2.5 |
Var4=1 | 1 | 2.5 | 1.5 |
Medians
Med | Var3=3 or Var3=2 | Var2=B | Var2=A or Var5=B |
---|---|---|---|
Var4=0 | 3 | 3 | 2.5 |
Var4=1 | 1 | 2.5 | 1.5 |
Tried this to sort specific values, but it doesn't really do the same thing that I want: Python Pandas Dataframe Pivot Table Column and Values Order Was looking more for the 'or' function to pivot on multiple conditionals.
You can try:
def my_func(g):
avg_1 = g.query("Var3 == 3 or Var3 == 2")["Data"].mean()
avg_2 = g.query("Var2 == 'B'")["Data"].mean()
avg_3 = g.query("Var2 == 'A' or Var5 == 'B'")["Data"].mean()
med_1 = g.query("Var3 == 3 or Var3 == 2")["Data"].median()
med_2 = g.query("Var2 == 'B'")["Data"].median()
med_3 = g.query("Var2 == 'A' or Var5 == 'B'")["Data"].median()
return pd.Series(
[avg_1, avg_2, avg_3, med_1, med_2, med_3],
index=["Avg1", "Avg2", "Avg3", "Med1", "Med2", "Med3"],
)
out = df.groupby("Var4").apply(my_func)
print(out)
Prints:
Avg1 Avg2 Avg3 Med1 Med2 Med3
Var4
0 2.8 3.0 2.5 3.0 3.0 2.5
1 1.0 2.5 1.5 1.0 2.5 1.5