I am conducting an event study and would like to pack the results of a function into an easy manageable form. The below function returns results in the form of:
Results are for one index for one event window (-5,+5 days) analysis. I have 17 indices and over 120 event days that I need to analyse which makes for one big output on which I have to perform further computations.
I have tried packing results into a dictionary, but the outputs are multi-level dictionaries which I do not know how to unpack into meaningful shape.
I was thinking about appending the results into a np.ndarray or alternating the outputs of the function so that it takes lists/series/arrays of values as input and then aggregates the outputs.
Unfortunately this is beyond my skill level, as I am fairly new to Python, but I am trying to learn as much as possible so any advice/help would be greatly appreciated.
FUNCTION:
def get_cum_returns(prices, sid, date, days_before, days_after, benchmark_sid):
"""
Calculates cumulative and abnormal returns for the sid & benchmark
Parameters
----------
prices : pd.DataFrame
Pricing history DataFrame obtained from `get_pricing`. Index should
be the datetime index and sids should be columns.
sid : int or zipline.assets._assets.Equity object
Security that returns are being calculated for.
date : datetime object
Date that will be used as t=0 for cumulative return calcuations. All
returns will be calculated around this date.
days_before, days_after : int
Days before/after to be used to calculate returns for.
benchmark : int or zipline.assets._assets.Equity object
Returns
-------
sid_returns : pd.Series
Cumulative returns time series from days_before ~ days_after from date
for sid
benchmark_returns : pd.Series
Cumulative returns time series for benchmark sid
abnormal_returns : pd.Series
Abnomral cumulative returns time series for sid compared against benchmark
"""
day_zero_index = prices.index.searchsorted(date)
starting_index = max(day_zero_index - days_before, 0)
ending_index = min(day_zero_index + days_after + 1, len(prices.index) - 1)
if starting_index < 0 or ending_index >= len(prices.index):
return None
if sid == benchmark_sid:
temp_price = prices.iloc[starting_index:ending_index,:].loc[:,[sid]]
else:
temp_price = prices.iloc[starting_index:ending_index,:].loc[:,[sid, benchmark_sid]]
beta = calc_beta(sid, benchmark_sid, temp_price)
if beta is None:
return
daily_ret = temp_price.pct_change().fillna(0)
daily_ret['abnormal_returns'] = daily_ret[sid] - beta*daily_ret[benchmark_sid]
cum_returns = (daily_ret + 1).cumprod() - 1
try:
# If there's not enough data for event study,
# return None
cum_returns.index = range(starting_index - day_zero_index,
ending_index - day_zero_index)
except:
return None
sid_returns = cum_returns[sid] - cum_returns[sid].iloc[0]
bench_returns = cum_returns[benchmark_sid] - cum_returns[benchmark_sid].iloc[0]
abnormal_returns = cum_returns['abnormal_returns'] - cum_returns['abnormal_returns'].iloc[0]
return sid_returns, bench_returns, abnormal_returns
Many thanks,
Am I right, that you output looks like this?
array = [pd.Series([1, 2, 3, 4, 5], index = [-3, -2, -1, 0, 1], name='first'),
pd.Series([11, 22, 33, 44, 55, 66], index = [-4, -3, -2, -1, 0, 1], name='second'),
pd.Series([111, 222, 333, 444, 555, 666, 777], index = [-4, -3, -2, -1, 0, 1, 2], name='last')]
Out:
[-3 1
-2 2
-1 3
0 4
1 5
Name: first, dtype: int64, -4 11
-3 22
-2 33
-1 44
0 55
1 66
Name: second, dtype: int64, -4 111
-3 222
-2 333
-1 444
0 555
1 666
2 777
Name: last, dtype: int64]
If I'm right, you can use pd.concat to concatenate series.
df = pd.concat(array, axis=1)
Out:
first second last
-4 NaN 11.0 111
-3 1.0 22.0 222
-2 2.0 33.0 333
-1 3.0 44.0 444
0 4.0 55.0 555
1 5.0 66.0 666
2 NaN NaN 777
After that you can fill NaN values if you want with df.fillna(value)
, where value should be a number. If you want, you can get a matrix:
pd.concat(array, axis=1).fillna(0).as_matrix()
Out:
array([[ 0., 11., 111.],
[ 1., 22., 222.],
[ 2., 33., 333.],
[ 3., 44., 444.],
[ 4., 55., 555.],
[ 5., 66., 666.],
[ 0., 0., 777.]])
I hope it helps.
UPD:
Maybe for you task will be better have a table like (day, value, id).
array = [pd.DataFrame(data=series.rename('value')).assign(ID = i) for i, series in enumerate(array)]
Out:
[ value ID
-3 1 0
-2 2 0
-1 3 0
0 4 0
1 5 0, value ID
-4 11 1
-3 22 1
-2 33 1
-1 44 1
0 55 1
1 66 1, value ID
-4 111 2
-3 222 2
-2 333 2
-1 444 2
0 555 2
1 666 2
2 777 2]
table = pd.concat(array, axis=0).reset_index().rename(columns={'index': 'day'})
Out:
day value ID
0 -3 1 0
1 -2 2 0
2 -1 3 0
3 0 4 0
4 1 5 0
5 -4 11 1
6 -3 22 1
7 -2 33 1
8 -1 44 1
9 0 55 1
10 1 66 1
11 -4 111 2
12 -3 222 2
13 -2 333 2
14 -1 444 2
15 0 555 2
16 1 666 2
17 2 777 2
After that you can group by day or ID and do what you want.