Search code examples
rlogistic-regressionnon-linear-regression

Build Logistic Regression Model for shares


The data i am working with , contains the closing prices of 10 shares of the S&P 500 index.

Data :

> dput(head(StocksData))
structure(list(ACE = c(56.86, 56.82, 56.63, 56.39, 55.97, 55.23
), AMD = c(8.47, 8.77, 8.91, 8.69, 8.83, 9.19), AFL = c(51.83, 
50.88, 50.78, 50.5, 50.3, 49.65), APD = c(81.59, 80.38, 80.03, 
79.61, 79.76, 79.77), AA = c(15.12, 15.81, 15.85, 15.66, 15.71, 
15.78), ATI = c(53.54, 52.37, 52.53, 51.91, 51.32, 51.45), AGN = c(69.77, 
69.53, 69.69, 69.98, 68.99, 68.75), ALL = c(29.32, 29.03, 28.99, 
28.66, 28.47, 28.2), MO = c(20.09, 20, 20.07, 20.16, 20, 19.88
), AMZN = c(184.22, 185.01, 187.42, 185.86, 185.49, 184.68)), row.names = c(NA, 
6L), class = "data.frame")

In the following part , i am calculating the daily percentage changes of 10 shares :

perc_change <- (StocksData[-1, ] - StocksData[-nrow(StocksData), ])/StocksData[-nrow(StocksData), ] * 100
perc_change

Output :

#    ACE  AMD   AFL    APD    AA   ATI   AGN   ALL    MO  AMZN
#2 -0.07  3.5 -1.83 -1.483  4.56 -2.19 -0.34 -0.99 -0.45  0.43
#3 -0.33  1.6 -0.20 -0.435  0.25  0.31  0.23 -0.14  0.35  1.30
#4 -0.42 -2.5 -0.55 -0.525 -1.20 -1.18  0.42 -1.14  0.45 -0.83
#5 -0.74  1.6 -0.40  0.188  0.32 -1.14 -1.41 -0.66 -0.79 -0.20
#6 -1.32  4.1 -1.29  0.013  0.45  0.25 -0.35 -0.95 -0.60 -0.44

With the above code i find the latest N rates of change (N should be in [1,10]). I want to make Logistic Regression Model in order to predict the change of the next day (N + 1), i.e., "increase" or "decrease".

Firstly, i split the data into two chunks: training and testing set : (NOTE: as testset i must take the last 40 sessions and as trainset the previous 85 sessions of the test set !)

trainset <- head(StocksData, 870)
testset <- tail(StocksData, 40)

Continued with the fitting of the model:

model <- glm(Here???,family=binomial(link='logit'),data=trainset)

The problem iam facing is that i dont have understand and i dont know what to include in the glm function. I have study many models of logistic regression and i think that i havent in my data this object that i need to place there.

Any help for this misunderstanding part of my code ?


Solution

  • Based on what you shared, you need to predict an increment or decrease when new data arrives about the portfolio you mentioned. In that case, you need to define the target variable. We can do that computing the number of positive and negative changes. With that variables, we can create a target variable with 1 if positive is greater than negative (there will be an increment) and with 0 if opposite (there will not be an increment). Data shared is pretty small but I have sketched the code so that you can apply the training/test approach for the modeling. Here the code:

    We will start from perc_change and compute the positive and negative variables:

    #Build variables
    #Store number of and positive negative changes
    i <- names(perc_change)
    perc_change$Neg <- apply(perc_change[,i],1,function(x) length(which(x<0)))
    perc_change$Pos <- apply(perc_change[,i],1,function(x) length(which(x>0)))
    

    Now, we create the target variable with a conditional:

    #Build target variable
    perc_change$Target <- ifelse(perc_change$Pos>perc_change$Neg,1,0)
    

    We create a replicate for data and remove non necessary variables:

    #Replicate data
    perc_change2 <- perc_change
    perc_change2$Neg <- NULL
    perc_change2$Pos <- NULL
    

    With perc_change2 the input is ready and you should split into train/test data. I will not do that as data is too small. I will go directly to the model:

    #Train the model, few data for train/test in example but you can adjust that
    model <- glm(Target~.,family=binomial(link='logit'),data=perc_change2)
    

    With that model, you know how to evaluate performance and other things. Please do not hesitate in telling me if more details are needed.