Search code examples
rnaivebayes

R NaiveBayes Classifier won't read type as "formula"


I am using the NaiveBayes function in the klaR package, and for some reason the function won't read my input parameters as formula. I understand that the NaiveBayes function has 2 methods, one is the "default" and another for input class of "formula". When I run the NaiveBayes function, it reads my input formula as the default type and throws errors instead. My code is as follows:

trainData <- read.csv("train.txt")
trainNB <- NaiveBayes(Type~., data = trainData)

The error that I received after running these lines is:

Error in NaiveBayes.default(X, Y, ...) :
  grouping/classes object must be a factor

trainData's class is data frame, and the first 10 rows of trainData is as follows (I have a total of 83 rows):

    Area Perimeter Compactness Length Width Asymmetry Groove Type
1  14.80     14.52      0.8823  5.656 3.288    3.1120  5.309    1
2  14.79     14.52      0.8819  5.545 3.291    2.7040  5.111    1
3  14.99     14.56      0.8883  5.570 3.377    2.9580  5.175    1
4  19.14     16.61      0.8722  6.259 3.737    6.6820  6.053    0
5  15.69     14.75      0.9058  5.527 3.514    1.5990  5.046    1
6  14.11     14.26      0.8722  5.520 3.168    2.6880  5.219    1
7  13.16     13.55      0.9009  5.138 3.201    2.4610  4.783    1
8  16.16     15.33      0.8644  5.845 3.395    4.2660  5.795    0
9  15.01     14.76      0.8657  5.789 3.245    1.7910  5.001    1
10 14.11     14.10      0.8911  5.420 3.302    2.7000  5.000    1

Any help would be greatly appreciated. Thank you!


Solution

  • I think your dependent variable is not a factor:

    you should do this , trainData$Type <- as.factor(trainData$Type)

    Just consider this toy example:

    library(e1071)
    m <- naiveBayes(Species ~ ., data = iris)
    

    If you want to see the structure , you will note that Species is in factor, Species here is your dependent variable:

    > str(iris)
    'data.frame':   150 obs. of  5 variables:
     $ Sepal.Length: num  5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
     $ Sepal.Width : num  3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
     $ Petal.Length: num  1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
     $ Petal.Width : num  0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
     $ Species     : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...
    >