Search code examples
pythoncsvfor-loopfinance

How to convert this to a for-loop with an output to CSV


I'm trying to put together a generic piece of code that would:

  1. Take a time series for some price data and divide it into deciles, e.g. take the past 18m of gold prices and divide it into deciles [DONE, see below]

    date        4. close   decile                
    2017-01-03    1158.2       0
    2017-01-04    1166.5       1
    2017-01-05    1181.4       2
    2017-01-06    1175.7       1
             ...     ...
    2018-04-23    1326.0       7
    2018-04-24    1333.2       8
    2018-04-25    1327.2       7
    [374 rows x 2 columns]
    
  2. Pull out the dates for a particular decile, then create a secondary datelist with an added 30 days

     #So far only for a single decile at a time
     firstdecile = gold.loc[gold['decile'] == 1]
     datelist = list(pd.to_datetime(firstdecile.index))
     datelist2 = list(pd.to_datetime(firstdecile.index) + pd.DateOffset(months=1))
    
  3. Take an average of those 30-day price returns for each decile

    level1 = gold.ix[datelist]
    level2 = gold.ix[datelist2]
    level2.index = level2.index - pd.DateOffset(months=1)
    result = pd.merge(level1,level2, how='inner', left_index=True, right_index=True)
    
    def ret(one, two):
      return (two - one)/one
    
    pricereturns = result.apply(lambda x :ret(x['4. close_x'], x['4. close_y']), axis=1)
    mean = pricereturns.mean()
    
  4. Return the list of all 10 averages in a single CSV file

So far I've been able to put together something functional that does steps 1-3 but only for a single decile, but I'm struggling to expand this to a looped-code for all 10 deciles at once with a clean CSV output


Solution

    1. First append the close price at t + 1 month as a new column on the whole dataframe.

      gold2_close = gold.loc[gold.index + pd.DateOffset(months=1), 'close']
      gold2_close.index = gold.index
      gold['close+1m'] = gold2_close
      

      However practically relevant should be the number of trading days, i.e. you won't have prices for the weekend or holidays. So I'd suggest you shift by number of rows, not by daterange, i.e. the next 20 trading days

      gold['close+20'] = gold['close'].shift(periods=-20)
      
    2. Now calculate the expected return for each row

      gold['ret'] = (gold['close+20'] - gold['close']) / gold['close']
      

      You can also combine steps 1. and 2. directly so you don't need the additional column (only if you shift by number of rows, not by fixed daterange due to reindexing)

      gold['ret'] = (gold['close'].shift(periods=-20) - gold['close']) / gold['close']
      
    3. Since you already have your deciles, you just need to groupby the deciles and aggregate the returns with mean()

      gold_grouped = gold.groupby(by="decile").mean()
      

    Putting in some random data you get something like the dataframe below. close and ret are the averages for each decile. You can create a csv from a dataframe via pandas.DataFrame.to_csv

                  close       ret
    decile                       
    0       1238.343597 -0.018290
    1       1245.663315  0.023657
    2       1254.073343 -0.025934
    3       1195.941312  0.009938
    4       1212.394511  0.002616
    5       1245.961831 -0.047414
    6       1200.676333  0.049512
    7       1181.179956  0.059099
    8       1214.438133  0.039242
    9       1203.060985  0.029938