Search code examples
pythonpandasdataframerowcalculation

Pandas Calculate Average Bias By Rows from 2 Columns


I have a dataframe that looks like this below and I am trying to calculate a simple bias by comparing two columns of data - the column 'obsvals' and 'modelvals'. I need to subtract 'obsvals' from 'modelvals' at each month and sum those differences to compute the months 1 and 2 cumulative bias. I'm not sure how to do that in python. I'm guessing a combination of using groupby 'plant_name' and maybe a lambda function..?

Here is the dataframe:

     plant_name  year  month  obsvals  modelvals  Bias
0     ARIZONA I  2021      1     8.90       8.30  0.60
1     ARIZONA I  2021      2     7.98       7.41  0.57
3     CAETITE I  2021      1     9.10       7.78  1.32
4     CAETITE I  2021      2     6.05       6.02  0.03 

My final answer should look like:

     plant_name  year  Bias
0    ARIZONA I   2021   0.58
1    CAETITE I   2021   0.67

thank you for your time,


Solution

  • IIUC, you need groupby:

    df = df.groupby(['plant_name','year']).agg({'Bias': np.mean}).reset_index()
    

    OUTPUT:

      plant_name  year   Bias
    0   ARIZONAI  2021  0.585
    1   CAETITEI  2021  0.675