Search code examples
pythonpandasdata-sciencefinancepanel-data

Pandas Panel Data - Identifying year gap and calculating returns


I am working with a large panel data of financial info, however the values are a bit spotty. I am trying to calculate the return between each year of each stock in my panel data. However, because of missing values sometimes firms have year gaps, making the: df['stock_ret'] = df.groupby(['tic'])['stock_price'].pct_change() impossible to practice as it would be wrong. The df looks something like this (just giving an example):

       datadate      month     fyear    ticker    price
0    31/12/1998         12      1998      AAPL   188.92
1    31/12/1999         12      1999      AAPL   197.44
2    31/12/2002         12      2002      AAPL   268.13
3    31/12/2003         12      2003      AAPL   278.06
4    31/12/2004         12      2004      AAPL   288.35
5    31/12/2005         12      2005      AAPL   312.23
6    31/05/2008          5      2008      TSLA    45.67
7    31/05/2009          5      2009      TSLA    38.29
8    31/05/2010          5      2010      TSLA    42.89
9    31/05/2011          5      2011      TSLA    56.03
10   31/05/2014          5      2014      TSLA   103.45
..       ...            ..       ..        ..      ..

What I am looking for is a piece of code that would allow me to understand (for each individual firm) if there is any gap in the data, and calculate returns for the two different series. Just like this:

       datadate      month     fyear    ticker    price   return
0    31/12/1998         12      1998      AAPL   188.92      NaN 
1    31/12/1999         12      1999      AAPL   197.44   0.0451
2    31/12/2002         12      2002      AAPL   268.13      NaN
3    31/12/2003         12      2003      AAPL   278.06   0.0370
4    31/12/2004         12      2004      AAPL   288.35   0.0370
5    31/12/2005         12      2005      AAPL   312.23   0.0828
6    31/05/2008          5      2008      TSLA    45.67      NaN
7    31/05/2009          5      2009      TSLA    38.29  -0.1616
8    31/05/2010          5      2010      TSLA    42.89   0.1201
9    31/05/2011          5      2011      TSLA    56.03   0.3063
10   31/05/2014          5      2014      TSLA   103.45      NaN
..       ...            ..       ..        ..      ..

If you have any other suggestions on how to treat this problem, please feel free to share your knowledge :) I am a bit inexperienced so I am sure that your advice could help!

Thank you in advance guys!


Solution

  • You can create a mask that tells if the last year existed and just update those years with pct change:

    df['return'] = np.nan
    mask = df.groupby('ticker')['fyear'].apply(lambda x: x.shift(1)==x-1)
    df.loc[mask,'return'] = df.groupby('ticker')['price'].pct_change()