Search code examples
pythonpandasreturnpanel

Operation across observations and year is returned NaN


I have a panel dataset with a set of countries [Italy and US] for 3 years and two numeric variables ['Var1', 'Var2']. I would like to calculate the rate of change in the last three years Ex: the value for Var1 in 2019 minus the value of Var1 in 2017 divided by Var1 in 2017.

I do not understand why my code (below) returns NaN errors?

data = {'Year':[2017, 2018, 2019, 2017, 2018, 2019], 'Country':['Italy', 'Italy', 'Italy', 'US' ,  'US', 'US'], 'Var1':[23,75,45, 32,13,14], 'Var2':[21,75,47, 30,11,18]}
trend = pd.DataFrame(data) 
list = ['Var1', 'Var2']
for col in list:
    trend[col + ' (3 Year % Change)'] = ((trend.loc[trend['Year']==2019][col]- trend.loc[trend['Year']==2017][col])/trend.loc[trend['Year']==2017][col])*100

trend

Solution

  • There are a few things going wrong here with your code:

    1. You are trying to divide pd.series and not only their arrays, and they are carrying their index, which causes the division to become NaN
    2. If you actually pass the values, for instance by using .values after the columns filters, you will bump into a ValueError because you want two values to insert into the entire DataFrame and pandas won't like that (length should be the same). This exemplifies it:
    trend.loc['Var1' + ' (3 Year % Change)'] = ((trend.loc[trend['Year']==2019, 'Var1'].values - \
                                            trend.loc[trend['Year']==2017, 'Var1'].values)/\
                                            trend.loc[trend['Year']==2017, 'Var1'].values)*100
    

    ValueError: cannot set a row with mismatched columns

    1. Not sure if you are using list as an actual variable name, but that is a reserved python word. It is not the best idea. You can read about it here

    If you want to compare the values with 2017 values in your sample, you can use groupby+shift, based on how many years to shift:

    for col in ['Var1','Var2']:
        trend[col + ' (3 Year % Change)'] = (trend[col] - trend.groupby('Country').shift(2)[col])/trend.groupby('Country').shift(2)[col]
    
    Out[1]:
       Year Country  Var1  Var2  Var1 (3 Year % Change)  Var2 (3 Year % Change)
    0  2017   Italy    23    21                     NaN                     NaN
    1  2018   Italy    75    75                     NaN                     NaN
    2  2019   Italy    45    47                0.956522                1.238095
    3  2017      US    32    30                     NaN                     NaN
    4  2018      US    13    11                     NaN                     NaN
    5  2019      US    14    18               -0.562500               -0.400000