Search code examples
pythonpandasdataframetime-seriesfinancial

How to determine the difference in quarterly row values of a pandas dataframe when quarters are missing


I am trying to find the difference in quarterly balance sheet equity values from the following data frame:

import pandas as pd
import numpy as np

df2= pd.DataFrame({'FirmID'    : pd.Series(['ID001',  'ID001',  'ID001',  'ID001',  'ID001', 'ID001',  'ID001',   'ID001',  'ID001',  'ID001' ]), 
                   'RSSD9999'  : pd.Series([20060331, 20060630, 20060930, 20061231, 20070331,20070630, 20070930,  20080630, 20080930, 20081231]), 
                   'year'      : pd.Series([2006,     2006,     2006,     2006,     2007,    2007,     2007,      2008,     2008,     2008    ]),
                   'Q'         : pd.Series([1,        2,        3,        4,        1,       2,        3,         2,        3,        4       ]), 
                   'EquityEoQ' : pd.Series([112,      223,      333,      445,      126,     251,      376,       291,      291,      503     ]),
                   'NewEqRight': pd.Series([112,      111,      110,      112,      126,      125,      125,      np.nan,      0  ,      212,    ])})
df2=df2[['FirmID','RSSD9999', 'year', 'Q', 'EquityEoQ','NewEqRight']]

The frame shows equity values at the end of quarters per year: EquityEoQ. Note, NewEqRight shows the values as they should be, with missing values for 2007Q4 and 2008Q2.

I can find the change in equity per quarter by taking the difference between row values. For example, firm ID001 issued 111 new equity in 2006Q2 (111 = 223 - 112).

If all quarterly rows in the data are present, then I can use shift to create a new column with last quarter's equity (EquityEoLastQ), and another column that records the difference between EquityEoQ and EquityEoLastQ to obtain the change in equity:

df2['EquityEoLastQ'] = df2.groupby(['FirmID'])['EquityEoQ'].shift(1)
df2['NewEqWrong']     = df2['EquityEoQ']-df2['EquityEoLastQ']
df2.loc[df2['Q']==1, 'NewEqWrong'] = df2.loc[df2['Q']==1, 'EquityEoQ']

The last row corrects values for Q1.

But if quarterly rows are missing, then shift messes up. For example, in the data-frame, the rows for 2007Q4 and 2008Q1 are missing. This leads to incorrect information, because shift refers to the wrong quarter. In this frame this approach gives a negative NewEqWrong value for 2008Q2 is -85.0, which is the wrong value.

Desired data set:

In [9]: df2
Out[9]:
  FirmID  RSSD9999  year  Q  EquityEoQ  NewEqRight  EquityEoLastQ  NewEqWrong
0  ID001  20060331  2006  1        112       112.0            NaN       112.0
1  ID001  20060630  2006  2        223       111.0          112.0       111.0
2  ID001  20060930  2006  3        333       110.0          223.0       110.0
3  ID001  20061231  2006  4        445       112.0          333.0       112.0
4  ID001  20070331  2007  1        126       126.0          445.0       126.0
5  ID001  20070630  2007  2        251       125.0          126.0       125.0
6  ID001  20070930  2007  3        376       125.0          251.0       125.0
7  ID001  20080630  2008  2        291         NaN          376.0       -85.0
8  ID001  20080930  2008  3        291         0.0          291.0         0.0
9  ID001  20081231  2008  4        503       212.0          291.0       212.0

In Stata one can set the time series frequency to quarterly, and then use L. or D. to find lags and differences respectively.

Is there any way pandas can deal with this problem, more or less like Stata?


Solution

  • Basically, I realised that I want to merge data. It then works as desired:

    See this link: lag values and differences and missing quarterly data