I am trying to convert SQL -> .csv data dump flat file into a time-series for each company.
The crux for me is that the data is organized into yearly tuples of 4 quarters. The quarter-end dates are Fiscal quarters given in the first tuple of each company. I need to munge the data into a time-series of calendar quarters, but cannot figure out how.
How can I build a datetime index from these data labels?
There are always the same number of tuples for each company, but may contain null values. The example dataframe rows below highlight 2 companies with different fiscal year end dates
eps_tuples[300:400]
Name Ticker Field Year Qtr 1 Qtr 2 Qtr 3 Qtr 4 FY
953 Accuray Inc ARAY EPS Year Sep.30 Dec.31 Mar.31 Jun.30 Full Year
943 Accuray Inc ARAY EPS 2012 -0.38 -0.15 -0.21 -0.28 -1.02
944 Accuray Inc ARAY EPS 2013 -0.31 -0.35 -0.42 -0.25 -1.33
945 Accuray Inc ARAY EPS 2014 -0.21 -0.07 -0.06 -0.13 -0.47
946 Accuray Inc ARAY EPS 2015 -0.27 -0.13 -0.04 -0.07 -0.51
947 Accuray Inc ARAY EPS 2016 -0.12
960 Accuride Corp ACW EPS Year Mar.31 Jun.30 Sep.30 Dec.31 Full Year
961 Accuride Corp ACW EPS 2012 -0.06 -0.02 -0.37 -0.47 -0.92
962 Accuride Corp ACW EPS 2013 -0.31 -0.11 -0.18 0.04 -0.56
963 Accuride Corp ACW EPS 2014 -0.07 0.11 0.02 -0.10 -0.04
964 Accuride Corp ACW EPS 2015 -0.01 0.13 0.04 -0.05 0.11
965 Accuride Corp ACW EPS 2016 0.02 0.11 0.04
I've started with slicing the df to get just the quarterly date labels
eps_tuples[eps_tuples['FY']=='Full Year'][42:47]
Name Ticker Field Year Qtr 1 Qtr 2 Qtr 3 Qtr 4 FY
906 ACCO Brands ACCO EPS Year Mar.31 Jun.30 Sep.30 Dec.31 Full Year
924 Accretive Healt ACHI EPS Year Mar.31 Jun.30 Sep.30 Dec.31 Full Year
942 Accuray Inc ARAY EPS Year Sep.30 Dec.31 Mar.31 Jun.30 Full Year
960 Accuride Corp ACW EPS Year Mar.31 Jun.30 Sep.30 Dec.31 Full Year
978 ACE Limited ACE EPS Year Mar.31 Jun.30 Sep.30 Dec.31 Full Year
Then I would usually make a pandas date range something like
rng=pd.date_range(end='2016-12-31',freq='Q',periods=20)
I am stumped because there are max 20 periods, possibly less, and the beginning and end are dynamically defined by the first tuple and not tied to the position in the tuple as one would assume if the tuples were calendar years.
How should I proceed?
You need to create the dates from the data you have instead of defining the range. That requires reshaping your data. Not the snappiest, but this seems to work on your sample data:
import datetime
ts_dict = {}
for company in df['Name'].unique():
tmpdf = df[df['Name'] == company][['Year', 'Qtr 1', 'Qtr 2', 'Qtr 3', 'Qtr 4']].reset_index(drop=True)
tmpdf.columns = tmpdf.iloc[0]
tmpdf = tmpdf.drop(0).set_index('Year').unstack().reset_index(name=company)
tmpdf.index = (tmpdf['Year'].apply(str) + tmpdf[0]).apply(lambda x: datetime.datetime.strptime(x, "%Y%b.%d"))
ts_dict[company] = tmpdf[company]
pd.DataFrame.from_dict(ts_dict)
giving
Accuray Inc Accuride Corp
2012-03-31 -0.21 -0.06
2012-06-30 -0.28 -0.02
2012-09-30 -0.38 -0.37
2012-12-31 -0.15 -0.47
2013-03-31 -0.42 -0.31
2013-06-30 -0.25 -0.11
2013-09-30 -0.31 -0.18
2013-12-31 -0.35 0.04
2014-03-31 -0.06 -0.07
2014-06-30 -0.13 0.11
2014-09-30 -0.21 0.02
2014-12-31 -0.07 -0.1
2015-03-31 -0.04 -0.01
2015-06-30 -0.07 0.13
2015-09-30 -0.27 0.04
2015-12-31 -0.13 -0.05
2016-03-31 NaN 0.02
2016-06-30 NaN 0.11
2016-09-30 -0.12 0.04
2016-12-31 NaN NaN