Search code examples
pandasdataframepandas-groupby

Data Frame % column by groupping


I am working on a forecast accuracy report which measure the deviation between actual & pervious projection. The measurement would be = 1- ('Actual' - 'M-1') / 'Actual' . There measure need to be groupped based different gratuity, say 'Product Category' / 'Line' / 'Product'. However, the df.groupby('Product Category').sum() function couldnt support the percentage calculation. Does anyone have idea how it should be fixed? Thanks!

data = {
    "Product Category": ['Drink', 'Drink','Drink','Food','Food','Food'],
    "Line": ['Water', 'Water','Wine','Fruit','Fruit','Fruit'],
    "Product": ['A', 'B', 'C','D','E','F'],
    "Actual": [100,50,40,20,70,50],
    "M-1": [120,40,10,20,80,50],
}
df = pd.DataFrame(data)

df['M1 Gap'] = df['Actual'] - df['M-1']
df['Error_Per'] =  1- df['M1 Gap'] / df['Actual']

Expected output would be enter image description here


Solution

  • You can also create a custom function and apply it on every row of a pandas data frame as follows. Just note that I set the axis argument to 1 so that the custom function is applied on each row or across columns:

    import pandas as pd
    
    def func(row):
        row['M1 Gap'] = row['Actual'] - row['M-1']
        row['Error_Per'] = 1 - (row['M1 Gap'] / row['Actual'])
        return row
    
    
    df.groupby('Product Category').sum().apply(func, axis = 1)
    
    
                      Actual    M-1  M1 Gap  Error_Per
    Product Category                                  
    Drink              190.0  170.0    20.0   0.894737
    Food               140.0  150.0   -10.0   1.071429