Search code examples
rclassificationsvmlibsvm

libsvm / e1071: Getting non-binary prediction value for binary class?


In my data, the last column shows the status of the samples which is either diseased(1) or free of diseased(0), and the goal is to classify the test samples in group of diseased(1) or free(0) though the predictions are "0.2189325" and "0.1674805" instead of 0 or 1.

sample.train.data <- structure(list(V1 = c(0.0504799681418526, 0.0674893975400467),
    V2 = c(0.375190991689635, 2.62836587379837e-07), V3 = c(0,
    0), V4 = c(0, 0), V5 = c(0, 0.123349117705797), V6 = c(0,
    0), V7 = c(0.0575526864592394, 4.0318003466356e-08), V8 = c(0,
    0), V9 = c(0, 0.0819121309767076), V10 = c(0.0837245737400836,
    5.8652477615664e-08), V11 = c(0, 0), V12 = c(0, 0), V13 = c(0,
    0), V14 = c(0, 0), V15 = c(0, 0), V16 = c(0, 0), V17 = c(0,
    0), V18 = c(0.0115973088249164, 8.12438769013043e-09), V19 = c(0,
    0), V20 = c(0, 0), V21 = c(0, 0.0642970332370127), V22 = c(0,
    0), V23 = c(0, 0), V24 = c(0, 0), V25 = c(0, 0), V26 = c(0,
    0), V27 = c(0, 0), V28 = c(0, 0), V29 = c(0, 0), V30 = c(0,
    0), V31 = c(0, 0.100087661334886), V32 = c(0, 0), V33 = c(0,
    0), V34 = c(0.132277333556899, 9.2665665514059e-08), V35 = c(0.00157299602821123,
    1.1019478536923e-09), V36 = c(0.121318235645494, 0.162196905737495
    ), V37 = c(0, 0), V38 = c(0.0661915890298985, 0.088495112621564
    ), V39 = c(0.10009431688377, 0.133821501722926), V40 = c(0,
    0.039928021903824), V41 = c(0, 0), V42 = c(0, 0), V43 = c(0,
    0), V44 = c(0, 0), V45 = c(0, 0.105729116180691), V46 = c(0,
    0), V47 = c(0, 0), V48 = c(0, 0), V49 = c(0, 0), V50 = c(0,
    0.0230295773750142), V51 = c(0, 0.00966395996496688), V52 = c(0,
    0), V53 = c(0, 0), V54 = c(0, 1)), .Names = c("V1", "V2",
"V3", "V4", "V5", "V6", "V7", "V8", "V9", "V10", "V11", "V12",
"V13", "V14", "V15", "V16", "V17", "V18", "V19", "V20", "V21",
"V22", "V23", "V24", "V25", "V26", "V27", "V28", "V29", "V30",
"V31", "V32", "V33", "V34", "V35", "V36", "V37", "V38", "V39",
"V40", "V41", "V42", "V43", "V44", "V45", "V46", "V47", "V48",
"V49", "V50", "V51", "V52", "V53", "V54"), row.names = 1:2, class = "data.frame")

sample.test.data <- structure(list(V1 = c(0, 0.0502553931936882), V2 = c(0.32474835570625,
0.373521844489033), V3 = c(0, 0), V4 = c(0, 0), V5 = c(0.0798572088141946,
0.09185084822725), V6 = c(0, 0), V7 = c(0, 0), V8 = c(0.0913439079721602,
4.76496954607063e-08), V9 = c(0, 0), V10 = c(0.0724682048784116,
0.0833521004105655), V11 = c(0, 0), V12 = c(0, 0.00380492674778399
), V13 = c(0, 0), V14 = c(0.0300930020345612, 1.56980625668248e-08
), V15 = c(0.022461356489053, 1.17170024810405e-08), V16 = c(0.037002165179523,
0.0425594671846318), V17 = c(0, 0), V18 = c(0.0100381060711198,
5.23639406184491e-09), V19 = c(0, 0), V20 = c(0, 0), V21 = c(0,
0), V22 = c(0, 0), V23 = c(0, 0), V24 = c(0, 0), V25 = c(0, 0.0150866858339266
), V26 = c(0, 0.0282083101023333), V27 = c(0, 0), V28 = c(0,
0), V29 = c(0, 0), V30 = c(0, 0), V31 = c(0, 0.0745294069522065
), V32 = c(0, 0), V33 = c(0, 0), V34 = c(0.114493278147107, 0.131688859030858
), V35 = c(0, 0), V36 = c(0.105007578581866, 5.47773710537665e-08
), V37 = c(0, 0), V38 = c(0, 0), V39 = c(0.0866371142792093,
0.0996490179492987), V40 = c(0.0258497218465435, 1.34845486806539e-08
), V41 = c(0, 0), V42 = c(0, 0), V43 = c(0, 0), V44 = c(0, 0.00549299131535034
), V45 = c(0, 0), V46 = c(0, 0), V47 = c(0, 0), V48 = c(0, 0),
    V49 = c(0, 0), V50 = c(0, 0), V51 = c(0, 0), V52 = c(0, 0
    ), V53 = c(0, 0), V54 = c(0, 0)), .Names = c("V1", "V2",
"V3", "V4", "V5", "V6", "V7", "V8", "V9", "V10", "V11", "V12",
"V13", "V14", "V15", "V16", "V17", "V18", "V19", "V20", "V21",
"V22", "V23", "V24", "V25", "V26", "V27", "V28", "V29", "V30",
"V31", "V32", "V33", "V34", "V35", "V36", "V37", "V38", "V39",
"V40", "V41", "V42", "V43", "V44", "V45", "V46", "V47", "V48",
"V49", "V50", "V51", "V52", "V53", "V54"), row.names = 81:82, class = "data.frame")

disease.col  <- paste("V", ncol(sample.train.data), sep= '')
f <- paste(disease.col, " ~ . ", sep="")
svm.model  <- svm(as.formula(f), data=sample.train.data, cost=100, gamma=1) 
svm.pred   <- predict(svm.model, sample.test.data[, -ncol(sample.test.data)])
comp.table <- table(pred=svm.pred, true = sample.test.data[, ncol(sample.test.data)])
print(comp.table)

Output:

                  true
pred               0
  0.16748052821151 1
  0.21893247843041 1

As u can see, predicted output is 0.167 and 0.218 while the samples can just be classified as either 0 or 1 and that is how the train data for svm is also classified.

NOTE: I have copied the sample here, the actual training data has 80 samples and the test data has 20 ones. This is the just a sample of training and test data with two samples for each of them. Also, the warning message for creating svm.model is not produced by the actual data.

I had tried with different value for cost or gamma for svm model, different combination of data, even if the test data has the status of the sample(0,1) I still get the similar result. I would deeply appreciate if someone could let me know what I am doing wrong.


Solution

  • Your response variable should be a factor in order to trigger the classification behavior. In your example that would be

    sample.train.data$V54<-factor(sample.train.data$V54)
    

    That will convert V54 from numeric to factor. Then you can just run the code the exactly the same way.