I am using the NaiveBayes function in the klaR package, and for some reason the function won't read my input parameters as formula. I understand that the NaiveBayes function has 2 methods, one is the "default" and another for input class of "formula". When I run the NaiveBayes function, it reads my input formula as the default type and throws errors instead. My code is as follows:
trainData <- read.csv("train.txt")
trainNB <- NaiveBayes(Type~., data = trainData)
The error that I received after running these lines is:
Error in NaiveBayes.default(X, Y, ...) :
grouping/classes object must be a factor
trainData's class is data frame, and the first 10 rows of trainData is as follows (I have a total of 83 rows):
Area Perimeter Compactness Length Width Asymmetry Groove Type
1 14.80 14.52 0.8823 5.656 3.288 3.1120 5.309 1
2 14.79 14.52 0.8819 5.545 3.291 2.7040 5.111 1
3 14.99 14.56 0.8883 5.570 3.377 2.9580 5.175 1
4 19.14 16.61 0.8722 6.259 3.737 6.6820 6.053 0
5 15.69 14.75 0.9058 5.527 3.514 1.5990 5.046 1
6 14.11 14.26 0.8722 5.520 3.168 2.6880 5.219 1
7 13.16 13.55 0.9009 5.138 3.201 2.4610 4.783 1
8 16.16 15.33 0.8644 5.845 3.395 4.2660 5.795 0
9 15.01 14.76 0.8657 5.789 3.245 1.7910 5.001 1
10 14.11 14.10 0.8911 5.420 3.302 2.7000 5.000 1
Any help would be greatly appreciated. Thank you!
I think your dependent variable is not a factor:
you should do this , trainData$Type <- as.factor(trainData$Type)
Just consider this toy example:
library(e1071)
m <- naiveBayes(Species ~ ., data = iris)
If you want to see the structure , you will note that Species is in factor, Species here is your dependent variable:
> str(iris)
'data.frame': 150 obs. of 5 variables:
$ Sepal.Length: num 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
$ Sepal.Width : num 3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
$ Petal.Length: num 1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
$ Petal.Width : num 0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
$ Species : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...
>