Search code examples
rquantmodquantitative-financealgorithmic-tradingquantstrat

(very) Simple quantstrat trading model using logistic regression


I have been playing around with the quantstrat backtesting package in R and I want to get some advice on a particularly (poor) strategy.

The idea is to buy whenever a logistic regression model tells me that the market is going to go up (indicated a 1 in the prediction column). Everyday the logistic regression tells me the market is going to go up, I buy .orderqtf = 10 of shares of Google. The day that the logistic regression tells me the price is going to go down (indicated by a 0 in the prediction column) then we dump all our current shares in Google and begin again until it tells us to buy.

Questions:

Is my code correct as to what I am describing?

If you notice I have lagged the two input variables. i.e. momentum(lag(GOOG$close), n = 12)

that is I want to use t-1 day to predict day t. Is this correct also? I do not want to use any indicator that might provide bia results for a prediction

There seems to be a bit of a learning curve with the quantstrat package for me so I just want to make sure I am getting the basic correct.

The model:

rm(list=ls())
require(quantstrat)
require(PerformanceAnalytics)

set.seed(1234)

#setting up some initial parameters for the quantstrat trading model
initDate="2007-01-01"
from <- "2017-01-01"
to <- "2018-12-01"
init_equity <- 1000
adjustment <- TRUE

.orderqty <- 10
.txnfees <- -10

currency('USD')
Sys.setenv(TZ="UTC")

#Collect the data
symbols <- c('GOOG')
getSymbols(symbols, from=from, to=to, src="yahoo", adjust=TRUE)  

colnames(GOOG) <- c("open", "high", "low", "close", "volume", "adjusted")

# create the dependent variable for a logistic regression
GOOG$direction <- with(GOOG, ifelse(close >= open, 1, 0))

#create two basic input variables - lagged
GOOG$rsi <- RSI(lag(GOOG$close), nFast=14, nSlow = 26, nSig = 9, maType = SMA)
GOOG$momentum <- momentum(lag(GOOG$close), n = 12)

GOOG <- GOOG[complete.cases(GOOG), ] 

# create a training and test set
train_date <- nrow(GOOG) *0.8
train <- GOOG[1:train_date,]
test <- GOOG[-c(1:train_date),]

#Run a simple logistic regression and obtain predicted probabilities
lm.fit <- glm(direction ~ rsi + momentum, data = train, family = binomial)
summary(lm.fit)
pr.lm <- predict(lm.fit, test, type = "response")


# Extract the OHLC from the GOOG stock and match it with the test dates
TEST <- subset(GOOG, index(GOOG) %in% index(test))

#Add out predictions to the TEST data if its greater than 0.6
TEST$prediction <- ifelse(pr.lm > 0.6, 1, 0)

paste0("Accuracy", mean(TEST$direction == TEST$prediction))

# Now that we have a strategy we want to buy everytime the logistic model states that
# the direction would be a "1"

# Setting up the strategy
GOOG <- TEST
stock("GOOG", currency="USD", multiplier=1)
strategy.st <- portfolio.st <- account.st <- "LogisticRegressionStrategy"
rm.strat(strategy.st)
rm.strat(portfolio.st)
rm.strat(account.st)

initPortf(name = portfolio.st,
          symbols = symbols, 
          initDate = initDate, 
          currency = 'USD')

initAcct(name = account.st, 
         portfolios = portfolio.st, 
         initDate = initDate, 
         currency = 'USD',
         initEq = init_equity)

initOrders(portfolio.st,
           symbols = symbols,
           initDate = initDate)

strategy(strategy.st, store = TRUE)


# Adding the rules, enter at the low price when "prediction" = 1, taking transaction fees into account
add.rule(strategy = strategy.st,
         name = "ruleSignal",
         arguments = list(sigcol = "prediction",
                          sigval = 1,
                          orderqty = .orderqty,
                          ordertype = "market",
                          #orderside = "long", 
                          prefer = "Low", 
                          TxnFees = .txnfees, 
                          replace = FALSE),
         type = "enter",
         label = "EnterLONG")

# As soon as the Logistic regression predicts a "0" we dump all our shares in GOOG

add.rule(strategy.st, 
         name = "ruleSignal", 
         arguments = list(sigcol = "prediction", 
                          sigval = 0, 
                          #orderside = "short", 
                          ordertype = "market", 
                          orderqty = "all", 
                          TxnFees = .txnfees, 
                          replace = TRUE), 
         type = "exit", 
         label = "Exit2SHORT")


applyStrategy(strategy.st, portfolios = portfolio.st)

updatePortf(portfolio.st)
updateAcct(account.st)
updateEndEq(account.st)

chart.Posn(portfolio.st, Symbol = "GOOG", 
           TA="add_SMA(n = 10, col = 2); add_SMA(n = 30, col = 4)")

Solution

  • Looks like you're almost there. You fit the model on the training data, and made sure to do the backtest on the test set, which is the right thing to do.

    Some things you want to be careful about though: don't set prefer = low in add.rule for enter signals; you'll never know in advance where the low was in real trading, for filling on the next bar.

    I let the logistic regression here predict 1 bar ahead from the current bar, because this is what you'd be doing if you were doing these predictions "online"/in real time. This is OK, provided we only use the predicted probabilities, and obviously never use direction_fwd as a trading signal because it would introduce lookforward bias.

    To make re-running the code easier, I also store the market data in an environment .data, so you can regenerate the data in GOOG for applyStrategy without requesting data again from yahoo if rerunning parts of the code.

    Also you probably want to cap the number of times you'll enter a position. You can do this with addPositionLimit. And you likely don't want to buy every bar when the prob > 0.6, but rather just the first time (the cross), so I've introduced signal code to handle this.

    Remember, by default in quantstrat, the order fills on the next bar of data (here, the open price on the next bar, since prefer = "Open"), which is by default to make fills more realistic (this is more applicable to intraday bar data, or tick data rows), but this is what you want here I think, since you don't know RSI and momentum values for the current bar at then end of the current bar, so filling on the next bar open makes good sense.

    rm(list=ls())
    require(quantstrat)
    require(PerformanceAnalytics)
    
    set.seed(1234)
    
    #setting up some initial parameters for the quantstrat trading model
    initDate="2007-01-01"
    from <- "2017-01-01"
    to <- "2018-12-01"
    init_equity <- 1000
    adjustment <- TRUE
    
    .orderqty <- 10
    .txnfees <- -10
    
    currency('USD')
    Sys.setenv(TZ="UTC")
    
    #Collect the data
    symbols <- c('GOOG')
    .data <- new.env()
    getSymbols(symbols, from=from, to=to, src="yahoo", adjust=TRUE, env = .data)  
    
    colnames(.data$GOOG) <- c("open", "high", "low", "close", "volume", "adjusted")
    
    mdata <- .data$GOOG
    
    # create the dependent variable for a logistic regression
    mdata$direction <- with(mdata, ifelse(close >= open, 1, 0))
    
    #create two basic input variables - lagged
    mdata$rsi <- RSI(mdata$close, nFast=14, nSlow = 26, nSig = 9, maType = SMA)
    mdata$momentum <- momentum(mdata$close, n = 12)
    
    mdata <- mdata[complete.cases(mdata), ] 
    mdata$direction_fwd <- lag.xts(mdata$direction, k = -1)
    # create a training and test set
    train_date <- nrow(mdata) *0.8
    train <- mdata[1:train_date,]
    test <- mdata[-c(1:train_date),]
    
    
    
    #Run a simple logistic regression and obtain predicted probabilities
    lm.fit <- glm(direction_fwd ~ rsi + momentum, data = train, family = binomial)
    summary(lm.fit)
    pr.lm <- predict(lm.fit, test, type = "response")
    test$pred_prob <- pr.lm
    
    #Add out predictions to the TEST data if its greater than 0.6
    test$prediction <- ifelse(pr.lm > 0.6, 1, 0)
    
    paste0("Accuracy: ", mean(test$direction_fwd == test$prediction, na.rm = T))
    
    
    # Simple way to run applyStrategy is to make sure the data for the symbol is in a variable with its name, like so:
    GOOG <- test
    
    
    stock("GOOG", currency="USD", multiplier=1)
    strategy.st <- portfolio.st <- account.st <- "LogisticRegressionStrategy"
    rm.strat(strategy.st)
    rm.strat(portfolio.st)
    rm.strat(account.st)
    
    
    
    initPortf(name = portfolio.st,
              symbols = symbols, 
              initDate = initDate, 
              currency = 'USD')
    
    initAcct(name = account.st, 
             portfolios = portfolio.st, 
             initDate = initDate, 
             currency = 'USD',
             initEq = init_equity)
    
    initOrders(portfolio.st,
               symbols = symbols,
               initDate = initDate)
    
    strategy(strategy.st, store = TRUE)
    
    nMult_orderqty <- 2
    addPosLimit(portfolio.st, symbol = "GOOG", timestamp = initDate, maxpos = nMult_orderqty * .orderqty)
    
    # Buy when prob exceeds 0.6 for the first time, using cross= TRUE
    add.signal(strategy = strategy.st,
             name = "sigThreshold",
             arguments = list(threshold=0.6, column="pred_prob", relationship="gt", cross= TRUE),
             label = "longSig")
    
     #exit when prob drops below 0.5 for the first time
    add.signal(strategy = strategy.st,
               name = "sigThreshold",
               arguments = list(threshold=0.5, column="pred_prob", relationship="lt", cross= TRUE),
               label = "exitLongSig")
    
    # Adding the rules, enter at the low price when "prediction" = 1, taking transaction fees into account
    add.rule(strategy = strategy.st,
             name = "ruleSignal",
             arguments = list(sigcol = "longSig",
                              sigval = 1,
                              orderqty = .orderqty,
                              ordertype = "market",
                              orderside = "long",
                              osFUN = osMaxPos,
                              prefer = "Open",  #Never kknow the low in advance. Use the open, as it is for the next day (be aware that the open price for bar data has its own problems too)
                              TxnFees = .txnfees, 
                              replace = FALSE),
             type = "enter",
             label = "EnterLONG")
    
    # As soon as the Logistic regression predicts a "0" we dump all our shares in GOOG
    
    add.rule(strategy.st, 
             name = "ruleSignal", 
             arguments = list(sigcol = "exitLongSig", 
                              sigval = 1, 
                              ordertype = "market", 
                              orderside = "long",
                              orderqty = "all", 
                              TxnFees = .txnfees, 
                              replace = TRUE), 
             type = "exit", 
             label = "Exit2SHORT")
    
    
    applyStrategy(strategy.st, portfolios = portfolio.st)
    
    updatePortf(portfolio.st)
    updateAcct(account.st)
    updateEndEq(account.st)
    
    chart.Posn(portfolio.st, Symbol = "GOOG", 
               TA="add_SMA(n = 10, col = 2); add_SMA(n = 30, col = 4)")