Search code examples
pythonpandasnumpyfinance

Creating rolling aggregate values in data frames Pandas


I am currently working with financial data and I am stuck on the last step of my analysis. I have a large panel data with many different companies and their returns throughout the the years, and the values across the years for the same company are not ordered. Similar to something like this:

        year     ticker     return_y
0       1985      VLID       -0.5838
1       1985        KO        0.3245
2       1994       CTL       -0.3063
3       1996      DRYR       -0.1607
..       ...       ...           ...
1356    2002      CHUX       -0.2456
1357    1987       HRL       -0.0233
1358    2015        KO        0.2343
..       ...       ...           ...
56798   2017      AFMXF       0.0558
56799   2014        TER       0.0134

What I have been trying to do is to create a separate data frame which sums the returns for each company on a 3 year rolling basis, and that if a year is missing (e.g. we have 1999 and 2001 but not 2000) restarts the rolling from the next point. Therefore something like this:

        3_years      ticker     return_y
0       1985-1987      VLID      -0.0245
1       1986-1988      VLID       0.0366
2       1987-1989      VLID      -0.0421
3       1993-1995      VLID      -0.1607
..       ...            ...          ...
12569   2008-2010     AFMXF       0.2349
12570   2012-2015     AFMXF       0.1112

Thank you in advance!


Solution

  • Let's say your data frame is called df.

    In short:

    You would do something like that:

    # ts stands for time on y-axis and security on x-axys
    returns_ts = df.pivot(index='year', columns='ticker', values='return_y')
    rolling_returns_ts = returns_ts.rolling(3).sum()
    

    This will give you the rolling returns in a different format (one I find easier to work with).

    To get them back to the format you have described you can:

    old_format_rolling_returns = rolling_returns_ts.stack().reset_index()
    

    Note that this assumes that the years axis needs to be a datetime format. Also note that for the conversion you want, then you need to change years format again to the one you like.