I have a data frame like this
col1 col2 col3 col4 col5
A A1 X 1 2
A A2 Y 2 2
A A3 Z 1 2
B B1 X 2 2
B B2 Y 2 2
B B3 Z 1 2
And i have a mapping like this.
mapping
col1 col3
A --> X
A --> Y
(A,B) --> Z
(A,B) --> (Y,Z)
This mapping works like a filter option that we use in excel. for eg: For A-->X, we'll filter A from col1 and filter X from col3, and get the sum from col4
for (A,B)-->Z, we'll filter both A & B from col1 and Z from col3, whatever sum we get from col4 should be stored.
I have tried groupby and pivot_table methods but couldn't be able to cover all the cases in mapping.
df1 = df.pivot_table(index= [col1,col3], columns = col3, values = col4, aggfunc='sum')
My expected result is.
col1 X Y Z YZ
A 1 2 2 6
B 2 6
Is there any way to get this using groupy or pivot_table, please let me know.
With the dataframe you provided:
import pandas as pd
df = pd.DataFrame(
{
"col1": ["A", "A", "A", "B", "B", "B"],
"col2": ["A1", "A2", "A3", "B1", "B2", "B3"],
"col3": ["X", "Y", "Z", "X", "Y", "Z"],
"col4": [1, 2, 1, 2, 2, 1],
"col5": [2, 2, 2, 2, 2, 2],
}
)
Here is one way to do it with Pandas groupby and advanced indexing:
# Get values
df = df.groupby(["col1", "col3"]).agg(list)["col4"].apply(lambda x: x[0])
print(df)
# Output
col1 col3
A X 1
Y 2
Z 1
B X 2
Y 2
Z 1
# Assign values to new dataframe
new_df = pd.DataFrame()
for i in ("X", "Y"):
new_df.at["A", i] = df.loc[("A", i)]
for i in ("A", "B"):
new_df.at[i, "Z"] = df.loc["A":"B", "Z", :].reset_index().sum()["col4"]
new_df.at[i, "YZ"] = df.loc["A":"B", "Y":"Z", :].reset_index().sum()["col4"]
# Cleanup
new_df = new_df.fillna(0).astype(int).replace(0, "")
new_df.columns.name = "col1"
Then:
print(new_df)
# Output
col1 X Y Z YZ
A 1 2 2 6
B 2 6