python-3.x dataframe statistics algorithmic-trading portfolio

Implementing a cointegration portfolio in Python for 3 ETFs (EWA, EWC, IGE)

I'm trying to implement a mean-reverting portfolio using the strategies described in "Algorithmic Trading" by Dr. P.E. Chan. However, since the examples he uses are programmed in MATLAB, I'm having trouble translating them correctly to Python. I'm completely stuck trying to create a cointegrating portfolio using 3 ETFs. I think my problems begin when trying to determine the hedges, and then building the desired portfolio.

Any help or tips would be enormously useful.

So, I start by downloading the Adjusted prices and creating the W, X and Y Data Series. The time period I selected is 2007/07/22 through 2012/3/28.

import numpy as np
import pandas as pd
import pandas_datareader.data as web
import matplotlib.pyplot as plt
%matplotlib inline

import statsmodels.api as sm

import datetime

start = datetime.datetime(2007, 7, 22)
end = datetime.datetime(2012, 3, 28)
EWA = web.DataReader('EWA', 'yahoo', start, end)
EWC = web.DataReader('EWC', 'yahoo', start, end)
IGE = web.DataReader('IGE', 'yahoo', start, end)

w = IGE['Adj Close']
x = EWA['Adj Close']
y = EWC['Adj Close']

df = pd.DataFrame([w,x,y]).transpose()
df.columns = ['W','X','Y']
df.plot(figsize=(20,12))

from statsmodels.tsa.vector_ar.vecm import coint_johansen

y3 = df

j_results = coint_johansen(y3,0,1)

print(j_results.lr1)                           
print(j_results.cvt)                           
print(j_results.eig)
print(j_results.evec)
print(j_results.evec[:,0])

So then I'm supposed to build a portfolio by multiplying the eigenvector [0.30.., 1.36.., -1.35..] times the share prices of each instrument to get the y_port value. Afterwards I run a correlation test to determine the correlation between daily change in price of this portfolio and the last day's price change, to be able to determine the half-life for the series.

I did this by just multiplying the eigenvector times the close prices, I don't know if this is where I went wrong.

    hedge_ratios = j_results.evec[:,0]
    y_port = (hedge_ratios * df).sum(axis=1)
    y_port.plot(figsize=(20,12))

    y_port_lag = y_port.shift(1)
    y_port_lag[0]= 0
    delta_y = y_port-y_port_lag

    X = y_port_lag
    Y = delta_y
    X = sm.add_constant(X)

    model = OLS(Y,X)
    regression_results  = model.fit()
    regression_results.summary()

So then I calculate the half-life, which is around 19 days.

halflife = -np.log(2)/regression_results.params[0]
halflife

And I define the number of units to hold based on the instructions on the book (the -Z value of the portfolio value, with a lookback window of 19 days based on the half-life).

num_units = -(y_port-y_port.rolling(19).mean())/y_port.rolling(19).std()
num_units.plot(figsize=(20,12))

So the next steps I take are:

Check to see if the dataframe is still correct.
Add the "Number of units to hold", which was calculated previously and is the negative Z score of the y_port value.
There was probably an easier way to multiply or do this, but I calculated the amount of $ I should hold for each instrument by multiplying the instrument price, by the hedge ratio given by the eigenvector, by the number of portfolio units to hold.
Finally I calculated each instrument's PNL by multiplying the daily change * the number of units I was holding.

The results are abysmal. Just losing all the way from beginning to end. ¿Where did I mess up? ¿how can I properly multiply the values in the eigenvector, determine the number of positions to hold, and create the portfolio correctly?

Any assistance would be massively appreciated.

I don't know why but the num_units series was "Horizontal" and I had to transpose it before attaching it to the DataFrame.

num_units = num_units.transpose()
df['Portfolio Units'] = num_units
df

df['W $ Units'] = df['W']*hedge_ratios[0]*df['Portfolio Units']
df['X $ Units'] = df['X']*hedge_ratios[1]*df['Portfolio Units']
df['Y $ Units'] = df['Y']*hedge_ratios[2]*df['Portfolio Units']

positions = df[['W $ Units','X $ Units','Y $ Units']]
positions

pnl = pd.DataFrame()

pnl['W Pnl'] = (df['W']/df['W'].shift(1)-1)*df['W $ Units']
pnl['X Pnl'] = (df['X']/df['X'].shift(1)-1)*df['X $ Units']
pnl['Y Pnl'] = (df['Y']/df['Y'].shift(1)-1)*df['Y $ Units']
pnl['Total PNL'] = pnl.sum(axis=1)

pnl['Total PNL'].cumsum().plot(figsize=(20,12))

I know that if I just revert my positions (not use -1 in the y_port), the results will change and I'll get a positive return. However, I want to know what I did wrong. Using -Z for a mean-reversion strategy makes sense, and I would like to know where I made the mistake, so I can keep up with the rest of the book,

Solution

I think that you need to shift df['W $ Units'], df['X $ Units'] and df['Y $ Units'] with 1 as well. So to use df['Y $ Units'].shift(1) instead of df['Y $ Units'], for example.

The result you receive is not abysmal - it is unrealistic. Without shifting df['... $ Units'] you are looking ahead and using data that is not yet available.