The data i am working with , contains the closing prices of 10 shares of the S&P 500 index.
Data :
> dput(head(StocksData))
structure(list(ACE = c(56.86, 56.82, 56.63, 56.39, 55.97, 55.23
), AMD = c(8.47, 8.77, 8.91, 8.69, 8.83, 9.19), AFL = c(51.83,
50.88, 50.78, 50.5, 50.3, 49.65), APD = c(81.59, 80.38, 80.03,
79.61, 79.76, 79.77), AA = c(15.12, 15.81, 15.85, 15.66, 15.71,
15.78), ATI = c(53.54, 52.37, 52.53, 51.91, 51.32, 51.45), AGN = c(69.77,
69.53, 69.69, 69.98, 68.99, 68.75), ALL = c(29.32, 29.03, 28.99,
28.66, 28.47, 28.2), MO = c(20.09, 20, 20.07, 20.16, 20, 19.88
), AMZN = c(184.22, 185.01, 187.42, 185.86, 185.49, 184.68)), row.names = c(NA,
6L), class = "data.frame")
In the following part , i am calculating the daily percentage changes of 10 shares :
perc_change <- (StocksData[-1, ] - StocksData[-nrow(StocksData), ])/StocksData[-nrow(StocksData), ] * 100
perc_change
Output :
# ACE AMD AFL APD AA ATI AGN ALL MO AMZN
#2 -0.07 3.5 -1.83 -1.483 4.56 -2.19 -0.34 -0.99 -0.45 0.43
#3 -0.33 1.6 -0.20 -0.435 0.25 0.31 0.23 -0.14 0.35 1.30
#4 -0.42 -2.5 -0.55 -0.525 -1.20 -1.18 0.42 -1.14 0.45 -0.83
#5 -0.74 1.6 -0.40 0.188 0.32 -1.14 -1.41 -0.66 -0.79 -0.20
#6 -1.32 4.1 -1.29 0.013 0.45 0.25 -0.35 -0.95 -0.60 -0.44
With the above code i find the latest N rates of change (N should be in [1,10]). I want to make Logistic Regression Model in order to predict the change of the next day (N + 1), i.e., "increase" or "decrease".
Firstly, i split the data into two chunks: training and testing set :
(NOTE: as testset
i must take the last 40 sessions and as trainset
the previous 85 sessions of the test set !)
trainset <- head(StocksData, 870)
testset <- tail(StocksData, 40)
Continued with the fitting of the model:
model <- glm(Here???,family=binomial(link='logit'),data=trainset)
The problem iam facing is that i dont have understand and i dont know what to include in the glm
function. I have study many models of logistic regression and i think that i havent in my data this object that i need to place there.
Any help for this misunderstanding part of my code ?
Based on what you shared, you need to predict an increment or decrease when new data arrives about the portfolio you mentioned. In that case, you need to define the target variable. We can do that computing the number of positive and negative changes. With that variables, we can create a target variable with 1 if positive is greater than negative (there will be an increment) and with 0 if opposite (there will not be an increment). Data shared is pretty small but I have sketched the code so that you can apply the training/test approach for the modeling. Here the code:
We will start from perc_change
and compute the positive and negative variables:
#Build variables
#Store number of and positive negative changes
i <- names(perc_change)
perc_change$Neg <- apply(perc_change[,i],1,function(x) length(which(x<0)))
perc_change$Pos <- apply(perc_change[,i],1,function(x) length(which(x>0)))
Now, we create the target variable with a conditional:
#Build target variable
perc_change$Target <- ifelse(perc_change$Pos>perc_change$Neg,1,0)
We create a replicate for data and remove non necessary variables:
#Replicate data
perc_change2 <- perc_change
perc_change2$Neg <- NULL
perc_change2$Pos <- NULL
With perc_change2
the input is ready and you should split into train/test data. I will not do that as data is too small. I will go directly to the model:
#Train the model, few data for train/test in example but you can adjust that
model <- glm(Target~.,family=binomial(link='logit'),data=perc_change2)
With that model, you know how to evaluate performance and other things. Please do not hesitate in telling me if more details are needed.