Search code examples
rmachine-learningdatasetlogistic-regression

Applying logistic regression to simple dataset


I have trying to apply logistic regression or any other of ML algorithm to this simple data set but I have failed miserably and got many error. I am tr

 dim(data)
 [1] 11580    12

 head(data)
 ReturnJan   ReturnFeb   ReturnMar   ReturnApr    ReturnMay  ReturnJune
  1  0.08067797  0.06625000  0.03294118  0.18309859  0.130333952 -0.01764234
  2 -0.01067989  0.10211539  0.14549595 -0.08442804 -0.327300392 -0.35926605
  3  0.04774193  0.03598972  0.03970223 -0.16235294 -0.147426982  0.04858934
  4 -0.07404022 -0.04816956  0.01821862 -0.02467917 -0.006036217 -0.02530364
  5 -0.03104575 -0.21267723  0.09147609  0.18933823 -0.153846154 -0.10611511
  6  0.57980016  0.33225225 -0.40546095 -0.06000000  0.060732113 -0.21536106

And the 12th column the one I am trying to predict looks like this

      PositiveDec
      0
      0
      0
      1
      1
      1

Here is my attempt

new.data <- data[,-12] #Remove labels' column

index <- sample(1:nrow(new.data), size = 0.8*nrow(new.data))#Split data

train.data <- new.data[index,]

test.data <- new.data[-index,]

fit.glm <- glm(data[,12]~.,data = data, family = "binomial")

Solution

  • You are getting there, but have several syntactic errors and, as pointed out in comments, need to leave your outcome variable in. This should work:

    index <- sample(1:nrow(data), size = 0.8 * nrow(data))
    train.data <- data[index, ]
    fit.glm <- glm(PositiveDec ~ ., data = train.data, family = "binomial")