Search code examples
pandasrowcalculated-columnscalculation

Calculate Monthly Variability by Long-Term Average Pandas


i have data that looks like this in dataframe "YM"showing the values for the year 2019 for months 8-12:

     year    month  adjusted_power
343  2019    08     20754.20750
344  2019    09     20305.95125
345  2019    10     18428.88125
346  2019    11     27864.02500
347  2019    12     25405.17500

And, I have monthly averages in a dataframe called "YMmonthmean" that looks like this -

      month  adjusted_power_AVERAGE
7     08    26161.086034
8     09    20707.108319
9     10    25684.728190
10    11    29468.227759
11    12    29509.313319

I'm trying to calculate the variability of each "year" value using the adjusted_power_AVERAGE column so that the result looks like this -

required output

I have tried variations of this below with differing errors. Both the variables YM and YMmonthmean are dataframes. Thank you for your assistance.

#NEED VARIABILITY BY MONTH AND YEAR
df_final = (YM.div(YMmonthmean.loc[1], axis=1) - 1)

Error most recent: TypeError: unsupported operand type(s) for /: 'str' and 'str'


Solution

  • I'd suggest to merge both data frames first

    YM = pd.DataFrame({'year': [2019, 2019, 2019, 2019, 2019],
                        'month': ['08', '09', '10', '11', '12'],
                       'adjusted_power':[20754.20750, 20305.95125, 18428.88125, 27864.02500, 25405.17500],
    })
    YMmonthmean = pd.DataFrame({'month': ['08', '09', '10', '11', '12'],
                       'adjusted_power_AVERAGE':[26161.086034, 20707.108319, 25684.728190, 29468.227759, 29509.313319],
    })
    
    YM = YM.merge(YMmonthmean, on = 'month', how='left')
    YM['var'] = YM['adjusted_power']/ YM['adjusted_power_AVERAGE'] -1
    
    

    results in

        year    month   adjusted_power  adjusted_power_AVERAGE  var
    0   2019    08      20754.20750     26161.086034            -0.206676
    1   2019    09      20305.95125     20707.108319            -0.019373
    2   2019    10      18428.88125     25684.728190            -0.282497
    3   2019    11      27864.02500     29468.227759            -0.054438
    4   2019    12.     25405.17500     29509.313319            -0.139079
    

    If you don't want to have the additional columns in there you could drop them afterwards