Python: KeyError 'shift'

I am new to Python and try to modify a pair trading script that I found here: https://github.com/quantopian/zipline/blob/master/zipline/examples/pairtrade.py

The original script is designed to use only prices. I would like to use returns to fit my models and price for invested quantity but I don't see how do it.

I have tried:

to define a data frame of returns in the main and call it in run
to define a data frame of returns in the main as a global object and use where needed in the 'handle data'
to define a data frame of retuns directly in the handle data

I assume the last option to be the most appropriate but then I have an error with panda 'shift' attribute.

More specifically I try to define 'DataRegression' as follow:

DataRegression = data.copy()
DataRegression[Stock1]=DataRegression[Stock1]/DataRegression[Stock1].shift(1)-1
DataRegression[Stock2]=DataRegression[Stock2]/DataRegression[Stock2].shift(1)-1
DataRegression[Stock3]=DataRegression[Stock3]/DataRegression[Stock3].shift(1)-1
DataRegression = DataRegression.dropna(axis=0)

where 'data' is a data frame which contains prices, stock1, stock2 and stock3 column names defined globally. Those lines in the handle data return the error:

File "A:\Apps\Python\Python.2.7.3.x86\lib\site-packages\zipline-0.5.6-py2.7.egg\zipline\utils\protocol_utils.py", line 85, in __getattr__
return self.__internal[key]
KeyError: 'shift'

Would anyone know why and how to do that correctly?

Many Thanks, Vincent

Solution

This is an interesting idea. The easiest way to do this in zipline is to use the Returns transform which adds a returns field to the event-frame (which is an ndict, not a pandas DataFrame as someone pointed out).

For this you have to add the transform to the initialize method: self.add_transform(Returns, 'returns', window_length=1)

(make sure to add from zipline.transforms import Returns at the beginning).

Then, inside the batch_transform you can access returns instead of prices:

@batch_transform
def ols_transform(data, sid1, sid2):
    """Computes regression coefficient (slope and intercept)
    via Ordinary Least Squares between two SIDs.
    """
    p0 = data.returns[sid1]
    p1 = sm.add_constant(data.returns[sid2])
    slope, intercept = sm.OLS(p0, p1).fit().params

    return slope, intercept

Alternatively, you could also create a batch_transform to convert prices to returns like you wanted to do.

@batch_transform
def returns(data):
    return data.price / data.price.shift(1) - 1

And then pass that to the OLS transform. Or do this computation inside of the OLS transform itself.

HTH, Thomas