Search code examples
rr-caretnaivebayes

R caret naïve bayes accuracy is null


I have one dataset to train with SVM and Naïve Bayes. SVM works, but Naïve Bayes doesn't work. Follow de source code below:

library(tools)
library(caret)
library(doMC)
library(mlbench)
library(magrittr)
library(caret)

CORES <- 5 #Optional
registerDoMC(CORES) #Optional

load("chat/rdas/2gram-entidades-erro.Rda")

set.seed(10)
split=0.60

maFinal$resposta <- as.factor(maFinal$resposta)
data_train <- as.data.frame(unclass(maFinal[ trainIndex,]))
data_test <- maFinal[-trainIndex,]

treegram25NotNull <- train(x = subset(data_train, select = -c(resposta)),
      y = data_train$resposta, 
      method = "nb",
      trControl = trainControl(method = "cv", number = 5, savePred=T, sampling = "up"))

treegram25NotNull

The final accuracy is null yy

Warning messages: 1: In nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, : There were missing values in resampled performance measures. 2: In train.default(subset(data_train, select = -c(resposta)), data_train$resposta, : missing values found in aggregated results

Any help would be greatly appreciated, thanks.


Solution

  • The fix is really simple:

    set.seed(10)
    split <- 0.60
    maFinal[] <- lapply(maFinal, as.factor)
    

    Currently all your variables, except for resposta, are numeric. However, they have only up to 12~ distinct values, meaning that they all actually should be factor variables. Also, many of them are highly unbalanced. Then, when splitting the sample, the issue arises from treating (actually factor) variables with only a single unique value as continuous variables.