Search code examples
pandasindexingcalculationelementwise-operations

Name-Specific Variability Calculations Pandas


I'm trying to calculate variability statistics from two df's - one with current data and one df with average data for the month. Suppose I have a df "DF1" that looks like this:

    Name year   month   output
0   A   1991    1       10864.8
1   A   1997    2       11168.5
2   B   1994    1       6769.2
3   B   1998    2       3137.91
4   B   2002    3       4965.21

and a df called "DF2" that contains monthly averages from multiple years such as:

    Name month  output_average
0   A   1       11785.199
1   A   2       8973.991
2   B   1       8874.113
3   B   2       6132.176667
4   B   3       3018.768

and, i need a new DF calling it "DF3" that needs to look like this with the calculations specific to the change in the "name" column and for each "month" change:

    Name year  month   Variability
0   A    1991  1      -0.078097875
1   A    1997  2       0.24454103
2   B    1994  1      -0.237197002
3   B    1998  2      -0.488287737
4   B    2002  3       0.644782

I have tried options like this below but with errors about duplicating the axis or key errors -

DF3['variability'] = 
((DF1.output/DF2.set_index('month'['output_average'].reindex(DF1['name']).values)-1)

Thank you for your help in leaning Python row calculations coming from matlab!


Solution

  • For two columns, you can better use merge instead of set_index:

    df3 = df1.merge(df2, on=['Name','month'], how='left')
    df3['variability'] = df3['output']/df3['output_average'] - 1
    

    Output:

      Name  year  month    output  output_average  variability
    0    A  1991      1  10864.80    11785.199000    -0.078098
    1    A  1997      2  11168.50     8973.991000     0.244541
    2    B  1994      1   6769.20     8874.113000    -0.237197
    3    B  1998      2   3137.91     6132.176667    -0.488288
    4    B  2002      3   4965.21     3018.768000     0.644780