I am currently working with financial data and I am stuck on the last step of my analysis. I have a large panel data with many different companies and their returns throughout the the years, and the values across the years for the same company are not ordered. Similar to something like this:
year ticker return_y
0 1985 VLID -0.5838
1 1985 KO 0.3245
2 1994 CTL -0.3063
3 1996 DRYR -0.1607
.. ... ... ...
1356 2002 CHUX -0.2456
1357 1987 HRL -0.0233
1358 2015 KO 0.2343
.. ... ... ...
56798 2017 AFMXF 0.0558
56799 2014 TER 0.0134
What I have been trying to do is to create a separate data frame which sums the returns for each company on a 3 year rolling basis, and that if a year is missing (e.g. we have 1999 and 2001 but not 2000) restarts the rolling from the next point. Therefore something like this:
3_years ticker return_y
0 1985-1987 VLID -0.0245
1 1986-1988 VLID 0.0366
2 1987-1989 VLID -0.0421
3 1993-1995 VLID -0.1607
.. ... ... ...
12569 2008-2010 AFMXF 0.2349
12570 2012-2015 AFMXF 0.1112
Thank you in advance!
Let's say your data frame is called df
.
In short:
You would do something like that:
# ts stands for time on y-axis and security on x-axys
returns_ts = df.pivot(index='year', columns='ticker', values='return_y')
rolling_returns_ts = returns_ts.rolling(3).sum()
This will give you the rolling returns in a different format (one I find easier to work with).
To get them back to the format you have described you can:
old_format_rolling_returns = rolling_returns_ts.stack().reset_index()
Note that this assumes that the years
axis needs to be a datetime
format.
Also note that for the conversion you want, then you need to change years
format again to the one you like.