Search code examples
pythonpandasdatetimeindexfiscal

Munge tuples of yearly fiscal data with element date-labels into a time-series in Python Pandas


I am trying to convert SQL -> .csv data dump flat file into a time-series for each company.

The crux for me is that the data is organized into yearly tuples of 4 quarters. The quarter-end dates are Fiscal quarters given in the first tuple of each company. I need to munge the data into a time-series of calendar quarters, but cannot figure out how.

How can I build a datetime index from these data labels?

There are always the same number of tuples for each company, but may contain null values. The example dataframe rows below highlight 2 companies with different fiscal year end dates

eps_tuples[300:400]

        Name    Ticker  Field   Year    Qtr 1   Qtr 2   Qtr 3   Qtr 4   FY
953     Accuray Inc     ARAY    EPS     Year    Sep.30  Dec.31  Mar.31  Jun.30  Full Year
943     Accuray Inc     ARAY    EPS     2012    -0.38   -0.15   -0.21   -0.28   -1.02
944     Accuray Inc     ARAY    EPS     2013    -0.31   -0.35   -0.42   -0.25   -1.33
945     Accuray Inc     ARAY    EPS     2014    -0.21   -0.07   -0.06   -0.13   -0.47
946     Accuray Inc     ARAY    EPS     2015    -0.27   -0.13   -0.04   -0.07   -0.51
947     Accuray Inc     ARAY    EPS     2016    -0.12               
960     Accuride Corp   ACW     EPS     Year    Mar.31  Jun.30  Sep.30  Dec.31  Full Year
961     Accuride Corp   ACW     EPS     2012    -0.06   -0.02   -0.37   -0.47   -0.92
962     Accuride Corp   ACW     EPS     2013    -0.31   -0.11   -0.18   0.04    -0.56
963     Accuride Corp   ACW     EPS     2014    -0.07   0.11    0.02    -0.10   -0.04
964     Accuride Corp   ACW     EPS     2015    -0.01   0.13    0.04    -0.05   0.11
965     Accuride Corp   ACW     EPS     2016    0.02    0.11    0.04        

I've started with slicing the df to get just the quarterly date labels

eps_tuples[eps_tuples['FY']=='Full Year'][42:47]

    Name    Ticker  Field   Year    Qtr 1   Qtr 2   Qtr 3   Qtr 4   FY
906     ACCO Brands     ACCO    EPS     Year    Mar.31  Jun.30  Sep.30  Dec.31  Full Year
924     Accretive Healt ACHI    EPS     Year    Mar.31  Jun.30  Sep.30  Dec.31  Full Year
942     Accuray Inc     ARAY    EPS     Year    Sep.30  Dec.31  Mar.31  Jun.30  Full Year
960     Accuride Corp   ACW     EPS     Year    Mar.31  Jun.30  Sep.30  Dec.31  Full Year
978     ACE Limited     ACE     EPS     Year    Mar.31  Jun.30  Sep.30  Dec.31  Full Year

Then I would usually make a pandas date range something like

rng=pd.date_range(end='2016-12-31',freq='Q',periods=20)

I am stumped because there are max 20 periods, possibly less, and the beginning and end are dynamically defined by the first tuple and not tied to the position in the tuple as one would assume if the tuples were calendar years.

How should I proceed?


Solution

  • You need to create the dates from the data you have instead of defining the range. That requires reshaping your data. Not the snappiest, but this seems to work on your sample data:

    import datetime
    
    ts_dict = {}
    for company in df['Name'].unique():
        tmpdf = df[df['Name'] == company][['Year', 'Qtr 1', 'Qtr 2', 'Qtr 3', 'Qtr 4']].reset_index(drop=True)
        tmpdf.columns = tmpdf.iloc[0]
        tmpdf = tmpdf.drop(0).set_index('Year').unstack().reset_index(name=company)
        tmpdf.index = (tmpdf['Year'].apply(str) + tmpdf[0]).apply(lambda x: datetime.datetime.strptime(x, "%Y%b.%d"))
        ts_dict[company] = tmpdf[company]
    pd.DataFrame.from_dict(ts_dict)
    

    giving

        Accuray Inc Accuride Corp
    2012-03-31  -0.21   -0.06
    2012-06-30  -0.28   -0.02
    2012-09-30  -0.38   -0.37
    2012-12-31  -0.15   -0.47
    2013-03-31  -0.42   -0.31
    2013-06-30  -0.25   -0.11
    2013-09-30  -0.31   -0.18
    2013-12-31  -0.35   0.04
    2014-03-31  -0.06   -0.07
    2014-06-30  -0.13   0.11
    2014-09-30  -0.21   0.02
    2014-12-31  -0.07   -0.1
    2015-03-31  -0.04   -0.01
    2015-06-30  -0.07   0.13
    2015-09-30  -0.27   0.04
    2015-12-31  -0.13   -0.05
    2016-03-31  NaN 0.02
    2016-06-30  NaN 0.11
    2016-09-30  -0.12   0.04
    2016-12-31  NaN NaN