Search code examples
hftzipline

zipline backtesting using non-US (European) intraday data


I'm trying to get zipline working with non-US, intraday data, that I've loaded into a pandas DataFrame:

                        BARC    HSBA     LLOY     STAN
Date                                                  
2014-07-01 08:30:00  321.250  894.55  112.105  1777.25
2014-07-01 08:32:00  321.150  894.70  112.095  1777.00
2014-07-01 08:34:00  321.075  894.80  112.140  1776.50
2014-07-01 08:36:00  321.725  894.80  112.255  1777.00
2014-07-01 08:38:00  321.675  894.70  112.290  1777.00

I've followed moving-averages tutorial here, replacing "AAPL" with my own symbol code, and the historical calls with "1m" data instead of "1d".

Then I do the final call using algo_obj.run(DataFrameSource(mydf)), where mydf is the dataframe above.

However there are all sorts of problems arising related to TradingEnvironment. According to the source code:

# This module maintains a global variable, environment, which is
# subsequently referenced directly by zipline financial
# components. To set the environment, you can set the property on
# the module directly:
#       from zipline.finance import trading
#       trading.environment = TradingEnvironment()
#
# or if you want to switch the environment for a limited context
# you can use a TradingEnvironment in a with clause:
#       lse = TradingEnvironment(bm_index="^FTSE", exchange_tz="Europe/London")
#       with lse:
# the code here will have lse as the global trading.environment
#           algo.run(start, end)

However, using the context doesn't seem to fully work. I still get errors, for example stating that my timestamps are before the market open (and indeed, looking at trading.environment.open_and_close the times are for the US market.

My question is, has anybody managed to use zipline with non-US, intra-day data? Could you point me to a resource and ideally example code on how to do this?

n.b. I've seen there are some tests on github that seem related to the trading calendars (tradincalendar_lse.py, tradingcalendar_tse.py , etc) - but this appears to only handle data at the daily level. I would need to fix:

  • open/close times
  • reference data for the benchmark
  • and probably more ...

Solution

  • I've got this working after fiddling around with the tutorial notebook. Code sample below. It's using the DF mid, as described in the original question. A few points bear mentioning:

    1. Trading Calendar I create one manually and assign to trading.environment, by using non_working_days in tradingcalendar_lse.py. Alternatively you could create one that fits your data exactly (however could be a problem for out-of-sample data). There are two fields that you need to define: trading_days and open_and_closes.

    2. sim_params There is a problem with the default start/end values because they aren't timezone aware. So you must create a sim_params object and pass start/end parameters with a timezone.

    3. Also, run() must be called with the argument overwrite_sim_params=False as calculate_first_open/close raise timestamp errors.

    I should mention that it's also possible to pass pandas Panel data, with fields open,high,low,close,price and volume in the minor_axis. But in this case, the former fields are mandatory - otherwise errors are raised.

    Note that this code only produces a daily summary of the performance. I'm sure there must be a way to get the result at a minute resolution (I thought this was set by emission_rate, but apparently it's not). If anybody knows please comment and I'll update the code. Also, not sure what the api call is to call 'analyze' (i.e. when using %%zipline magic in IPython, as in the tutorial, the analyze() method gets automatically called. How do I do this manually?)

    import pytz
    from datetime import datetime
    
    from zipline.algorithm import TradingAlgorithm
    from zipline.utils import tradingcalendar
    from zipline.utils import tradingcalendar_lse
    from zipline.finance.trading import TradingEnvironment
    from zipline.api import order_target, record, symbol, history, add_history
    from zipline.finance import trading
    
    def initialize(context):
        # Register 2 histories that track daily prices,
        # one with a 100 window and one with a 300 day window
        add_history(10, '1m', 'price')
        add_history(30, '1m', 'price')
    
        context.i = 0
    
    
    def handle_data(context, data):
        # Skip first 30 mins to get full windows
        context.i += 1
        if context.i < 30:
            return
    
        # Compute averages
        # history() has to be called with the same params
        # from above and returns a pandas dataframe.
        short_mavg = history(10, '1m', 'price').mean()
        long_mavg = history(30, '1m', 'price').mean()
    
        sym = symbol('BARC')
    
        # Trading logic
        if short_mavg[sym] > long_mavg[sym]:
            # order_target orders as many shares as needed to
            # achieve the desired number of shares.
            order_target(sym, 100)
        elif short_mavg[sym] < long_mavg[sym]:
            order_target(sym, 0)
    
        # Save values for later inspection
        record(BARC=data[sym].price,
               short_mavg=short_mavg[sym],
               long_mavg=long_mavg[sym])
    
    def analyze(context,perf) : 
        perf["pnl"].plot(title="Strategy P&L")
    
    # Create algorithm object passing in initialize and
    # handle_data functions
    
    # This is needed to handle the correct calendar. Assume that market data has the right index for tradeable days.
    # Passing in env_trading_calendar=tradingcalendar_lse doesn't appear to work, as it doesn't implement open_and_closes
    from zipline.utils import tradingcalendar_lse
    trading.environment = TradingEnvironment(bm_symbol='^FTSE', exchange_tz='Europe/London')
    #trading.environment.trading_days = mid.index.normalize().unique()
    trading.environment.trading_days = pd.date_range(start=mid.index.normalize()[0],
                                                     end=mid.index.normalize()[-1],
                                                     freq=pd.tseries.offsets.CDay(holidays=tradingcalendar_lse.non_trading_days))
    
    trading.environment.open_and_closes = pd.DataFrame(index=trading.environment.trading_days,columns=["market_open","market_close"])
    trading.environment.open_and_closes.market_open = (trading.environment.open_and_closes.index + pd.to_timedelta(60*7,unit="T")).to_pydatetime()
    trading.environment.open_and_closes.market_close = (trading.environment.open_and_closes.index + pd.to_timedelta(60*15+30,unit="T")).to_pydatetime()
    
    
    from zipline.utils.factory import create_simulation_parameters
    sim_params = create_simulation_parameters(
       start = pd.to_datetime("2014-07-01 08:30:00").tz_localize("Europe/London").tz_convert("UTC"),  #Bug in code doesn't set tz if these are not specified (finance/trading.py:SimulationParameters.calculate_first_open[close])
       end = pd.to_datetime("2014-07-24 16:30:00").tz_localize("Europe/London").tz_convert("UTC"),
       data_frequency = "minute",
       emission_rate = "minute",
       sids = ["BARC"])
    algo_obj = TradingAlgorithm(initialize=initialize, 
                                handle_data=handle_data,
                                sim_params=sim_params)
    
    # Run algorithm
    perf_manual = algo_obj.run(mid,overwrite_sim_params=False) # overwrite == True calls calculate_first_open[close] (see above)