I'm trying to put together a generic piece of code that would:
Take a time series for some price data and divide it into deciles, e.g. take the past 18m of gold prices and divide it into deciles [DONE, see below]
date 4. close decile
2017-01-03 1158.2 0
2017-01-04 1166.5 1
2017-01-05 1181.4 2
2017-01-06 1175.7 1
... ...
2018-04-23 1326.0 7
2018-04-24 1333.2 8
2018-04-25 1327.2 7
[374 rows x 2 columns]
Pull out the dates for a particular decile, then create a secondary datelist with an added 30 days
#So far only for a single decile at a time
firstdecile = gold.loc[gold['decile'] == 1]
datelist = list(pd.to_datetime(firstdecile.index))
datelist2 = list(pd.to_datetime(firstdecile.index) + pd.DateOffset(months=1))
Take an average of those 30-day price returns for each decile
level1 = gold.ix[datelist]
level2 = gold.ix[datelist2]
level2.index = level2.index - pd.DateOffset(months=1)
result = pd.merge(level1,level2, how='inner', left_index=True, right_index=True)
def ret(one, two):
return (two - one)/one
pricereturns = result.apply(lambda x :ret(x['4. close_x'], x['4. close_y']), axis=1)
mean = pricereturns.mean()
Return the list of all 10 averages in a single CSV file
So far I've been able to put together something functional that does steps 1-3 but only for a single decile, but I'm struggling to expand this to a looped-code for all 10 deciles at once with a clean CSV output
First append the close price at t + 1 month
as a new column on the whole dataframe.
gold2_close = gold.loc[gold.index + pd.DateOffset(months=1), 'close']
gold2_close.index = gold.index
gold['close+1m'] = gold2_close
However practically relevant should be the number of trading days, i.e. you won't have prices for the weekend or holidays. So I'd suggest you shift by number of rows, not by daterange, i.e. the next 20 trading days
gold['close+20'] = gold['close'].shift(periods=-20)
Now calculate the expected return for each row
gold['ret'] = (gold['close+20'] - gold['close']) / gold['close']
You can also combine steps 1. and 2. directly so you don't need the additional column (only if you shift by number of rows, not by fixed daterange due to reindexing)
gold['ret'] = (gold['close'].shift(periods=-20) - gold['close']) / gold['close']
Since you already have your deciles, you just need to groupby
the deciles and aggregate the returns with mean()
gold_grouped = gold.groupby(by="decile").mean()
Putting in some random data you get something like the dataframe below. close
and ret
are the averages for each decile. You can create a csv from a dataframe via pandas.DataFrame.to_csv
close ret
decile
0 1238.343597 -0.018290
1 1245.663315 0.023657
2 1254.073343 -0.025934
3 1195.941312 0.009938
4 1212.394511 0.002616
5 1245.961831 -0.047414
6 1200.676333 0.049512
7 1181.179956 0.059099
8 1214.438133 0.039242
9 1203.060985 0.029938