Search code examples
rsvmkernlab

SVM Classification with Caret Error (Basic)


I am probably making a very simple (and stupid) mistake here but I cannot figure it out. I am playing with some data from Kaggle (Digit Recognizer) and trying to use SVM with the Caret package to do some classification. If I just plug the label values into the function as type numeric, the train function in Caret seems to default to regression and performance is quite poor. So what I tried next is to convert it to a factor with the function factor() and try and run SVM classification. Here is some code where I generate some dummy data and then plug it into Caret:

library(caret)
library(doMC)
registerDoMC(cores = 4)

ytrain <- factor(sample(0:9, 1000, replace=TRUE))
xtrain <- matrix(runif(252 * 1000,0 , 255), 1000, 252)

preProcValues <- preProcess(xtrain, method = c("center", "scale"))
transformerdxtrain <- predict(preProcValues, xtrain)

fitControl <- trainControl(method = "repeatedcv", number = 10, repeats = 10)
svmFit <- train(transformerdxtrain[1:10,], ytrain[1:10], method = "svmradial")

I get this error:

Error in kernelMult(kernelf(object), newdata, xmatrix(object)[[p]], coef(object)[[p]]) : 
  dims [product 20] do not match the length of object [0]
In addition: Warning messages:
1: In train.default(transformerdxtrain[1:10, ], ytrain[1:10], method = "svmradial") :
  At least one of the class levels are not valid R variables names; This may cause errors if class probabilities are generated because the variables names will be converted to: X0, X1, X2, X3, X4, X5, X6, X7, X8, X9
2: In nominalTrainWorkflow(dat = trainData, info = trainInfo, method = method,  :
  There were missing values in resampled performance measures.

Can somebody tell me what I am doing wrong? Thank you!


Solution

  • You have 10 different classes and yet you are only including 10 cases in train(). This means that when you resample you will frequently not have all 10 classes in individual instances of your classifier. train() is having difficulty combining the results of these varying-category SVMs.

    You can fix this by some combination of increasing the number of cases, decreasing the number of classes, or even using a different classifier.