I'm working on a data set of IBM by using quantmod
. I created two variables and then I used the glm
function to see the relation between the two of them. The code ran good but then I noticed that part of the data frame contains NA
s. How can I overcome this issue?
Here is my code:
library("quantmod")
getSymbols("IBM")
dim(IBM)
IBM$CurrtDay_up <- ifelse(IBM$IBM.Open < IBM$IBM.Close,1,0)
IBM$LastDay_green <- ifelse((lag(IBM$IBM.Open,k=1) < lag(IBM$IBM.Close,k=1)),1,0)
head(IBM)
IBM.Open IBM.High IBM.Low IBM.Close IBM.Volume IBM.Adjusted CurrtDay_up LastDay_green
2007-01-03 97.18 98.40 96.26 97.27 9196800 82.78498 1 NA
2007-01-04 97.25 98.79 96.88 98.31 10524500 83.67011 1 1
2007-01-05 97.60 97.95 96.91 97.42 7221300 82.91264 0 1
2007-01-08 98.50 99.50 98.35 98.90 10340000 84.17225 1 0
2007-01-09 99.08 100.33 99.07 100.07 11108200 85.16802 1 1
2007-01-10 98.50 99.05 97.93 98.89 8744800 84.16374 1 1
then I added the glm
function:
IBM_1 <- IBM[3:1000,] # to avoid the first row's NA.
glm_greenDay <- glm(CurrtDay_up~LastDay_green,data=IBM_1,family=binomial(link='logit'))
IBM_1$glm_pred<-predict(glm_greenDay,newdata=IBM_1,type='response')
head(IBM_1)
IBM.Open IBM.High IBM.Low IBM.Close IBM.Volume IBM.Adjusted CurrtDay_up LastDay_green glm_pred
2007-01-04 NA NA NA NA NA NA NA NA 0.5683453
2007-01-05 97.60 97.95 96.91 97.42 7221300 82.91264 0 1 NA
2007-01-07 NA NA NA NA NA NA NA NA 0.5407240
2007-01-08 98.50 99.50 98.35 98.90 10340000 84.17225 1 0 NA
2007-01-08 NA NA NA NA NA NA NA NA 0.5683453
2007-01-09 99.08 100.33 99.07 100.07 11108200 85.16802 1 1 NA
UPDATED CODE (please notice that one row (row # 2) has been duplicated: :
IBM_1<-IBM[complete.cases(IBM[1:1000,]),] # to evoid the first row's NA.
glm_greenDay<-glm(CurrtDay_up~LastDay_green,data=IBM_1,family=binomial(link='logit'))
IBM_1$glm_pred<-glm_greenDay$fitted.values
head(IBM_1)
IBM.Open IBM.High IBM.Low IBM.Close IBM.Volume IBM.Adjusted CurrtDay_up LastDay_green glm_pred
2007-01-03 NA NA NA NA NA NA NA NA 0.5691203
2007-01-04 97.25 98.79 96.88 98.31 10524500 83.67011 1 1 NA
2007-01-04 NA NA NA NA NA NA NA NA 0.5691203
2007-01-05 97.60 97.95 96.91 97.42 7221300 82.91264 0 1 NA
2007-01-07 NA NA NA NA NA NA NA NA 0.5407240
2007-01-08 98.50 99.50 98.35 98.90 10340000 84.17225 1 0 NA
The problem is arising because the output of predict()
is not an xts
class object. The slots in the vector of predicted values have dates for names, but the vector is still just a vector without time indexing. I was able to get a simple call to merge()
to work without dropping NAs before modeling by converting the output of predict()
to class xts
first:
library(quantmod)
getSymbols("IBM")
IBM$CurrtDay_up <- ifelse(IBM$IBM.Open < IBM$IBM.Close, 1, 0)
IBM$LastDay_green <- ifelse((lag(IBM$IBM.Open, k=1) < lag(IBM$IBM.Close, k=1)), 1, 0)
glm_greenDay <- glm(CurrtDay_up~LastDay_green, data=IBM, family=binomial(link='logit'), na.action=na.exclude)
glm_pred <- predict(glm_greenDay, type='response')
glm_pred_xts <- xts(x = glm_pred, order.by = as.Date(names(glm_pred)))
IBM2 <- merge(IBM, glm_pred_xts)
That seems to produce the desired output:
> head(glm_pred)
2007-01-03 2007-01-04 2007-01-05 2007-01-08 2007-01-09 2007-01-10
NA 0.5383952 0.5383952 0.5383065 0.5383952 0.5383952
> head(IBM2)
IBM.Open IBM.High IBM.Low IBM.Close IBM.Volume IBM.Adjusted CurrtDay_up LastDay_green glm_pred_xts
2007-01-03 97.18 98.40 96.26 97.27 9196800 82.78498 1 NA NA
2007-01-04 97.25 98.79 96.88 98.31 10524500 83.67011 1 1 0.5383952
2007-01-05 97.60 97.95 96.91 97.42 7221300 82.91264 0 1 0.5383952
2007-01-08 98.50 99.50 98.35 98.90 10340000 84.17225 1 0 0.5383065
2007-01-09 99.08 100.33 99.07 100.07 11108200 85.16802 1 1 0.5383952
2007-01-10 98.50 99.05 97.93 98.89 8744800 84.16374 1 1 0.5383952