Search code examples
pythonpandasshiftzipline

Python: KeyError 'shift'


I am new to Python and try to modify a pair trading script that I found here: https://github.com/quantopian/zipline/blob/master/zipline/examples/pairtrade.py

The original script is designed to use only prices. I would like to use returns to fit my models and price for invested quantity but I don't see how do it.

I have tried:

  • to define a data frame of returns in the main and call it in run
  • to define a data frame of returns in the main as a global object and use where needed in the 'handle data'
  • to define a data frame of retuns directly in the handle data

I assume the last option to be the most appropriate but then I have an error with panda 'shift' attribute.

More specifically I try to define 'DataRegression' as follow:

DataRegression = data.copy()
DataRegression[Stock1]=DataRegression[Stock1]/DataRegression[Stock1].shift(1)-1
DataRegression[Stock2]=DataRegression[Stock2]/DataRegression[Stock2].shift(1)-1
DataRegression[Stock3]=DataRegression[Stock3]/DataRegression[Stock3].shift(1)-1
DataRegression = DataRegression.dropna(axis=0)

where 'data' is a data frame which contains prices, stock1, stock2 and stock3 column names defined globally. Those lines in the handle data return the error:

File "A:\Apps\Python\Python.2.7.3.x86\lib\site-packages\zipline-0.5.6-py2.7.egg\zipline\utils\protocol_utils.py", line 85, in __getattr__
return self.__internal[key]
KeyError: 'shift'

Would anyone know why and how to do that correctly?

Many Thanks, Vincent


Solution

  • This is an interesting idea. The easiest way to do this in zipline is to use the Returns transform which adds a returns field to the event-frame (which is an ndict, not a pandas DataFrame as someone pointed out).

    For this you have to add the transform to the initialize method: self.add_transform(Returns, 'returns', window_length=1)

    (make sure to add from zipline.transforms import Returns at the beginning).

    Then, inside the batch_transform you can access returns instead of prices:

    @batch_transform
    def ols_transform(data, sid1, sid2):
        """Computes regression coefficient (slope and intercept)
        via Ordinary Least Squares between two SIDs.
        """
        p0 = data.returns[sid1]
        p1 = sm.add_constant(data.returns[sid2])
        slope, intercept = sm.OLS(p0, p1).fit().params
    
        return slope, intercept
    

    Alternatively, you could also create a batch_transform to convert prices to returns like you wanted to do.

    @batch_transform
    def returns(data):
        return data.price / data.price.shift(1) - 1
    

    And then pass that to the OLS transform. Or do this computation inside of the OLS transform itself.

    HTH, Thomas