I am working with the wbca data from the faraway package. The prior probability of sampling a malignant tumor is π0 = 1/3 and the prior probability for sampling a benign tumor is π1 = 2/3.
I am trying to use the naive Bayes classifier with multinomials to see if there is a good subset of the 9 features that classifies better than the full set using LOOCV.
I am unsure where to even begin with this, so any Rcode help would be great. Thanks!
You can try something below, the kernel estimate of your predictors might not be the most accurate, but it's something you can start with:
library(faraway)
library(naivebayes)
library(caret)
x = wbca[,!grepl("Class",colnames(wbca))]
y = factor(wbca$Class)
ctrl <- rfeControl(functions = nbFuncs,
method = "LOOCV")
bayesProfile <- rfe(x, y,
sizes = subsets,
rfeControl = ctrl)
bayesProfile
Recursive feature selection
Outer resampling method: Leave-One-Out Cross-Validation
Resampling performance over subset size:
Variables Accuracy Kappa Selected
2 0.9501 0.8891
3 0.9648 0.9225
4 0.9648 0.9223
5 0.9677 0.9290
6 0.9750 0.9454 *
7 0.9692 0.9322
8 0.9750 0.9455
9 0.9662 0.9255
The top 5 variables (out of 6):
USize, UShap, BNucl, Chrom, Epith
You can get the optimal variables:
bayesProfile$optVariables
[1] "USize" "UShap" "BNucl" "Chrom" "Epith" "Thick"