How to use LOOCV to find a subset that classifies better than full set in R

I am working with the wbca data from the faraway package. The prior probability of sampling a malignant tumor is π0 = 1/3 and the prior probability for sampling a benign tumor is π1 = 2/3.

I am trying to use the naive Bayes classifier with multinomials to see if there is a good subset of the 9 features that classifies better than the full set using LOOCV.

I am unsure where to even begin with this, so any Rcode help would be great. Thanks!

Solution

You can try something below, the kernel estimate of your predictors might not be the most accurate, but it's something you can start with:

library(faraway)
library(naivebayes)
library(caret)

x = wbca[,!grepl("Class",colnames(wbca))]
y = factor(wbca$Class)

ctrl <- rfeControl(functions = nbFuncs,
                   method = "LOOCV")

bayesProfile <- rfe(x, y,
                 sizes = subsets,
                 rfeControl = ctrl)

bayesProfile

Recursive feature selection

Outer resampling method: Leave-One-Out Cross-Validation 

Resampling performance over subset size:

 Variables Accuracy  Kappa Selected
         2   0.9501 0.8891         
         3   0.9648 0.9225         
         4   0.9648 0.9223         
         5   0.9677 0.9290         
         6   0.9750 0.9454        *
         7   0.9692 0.9322         
         8   0.9750 0.9455         
         9   0.9662 0.9255         

The top 5 variables (out of 6):
   USize, UShap, BNucl, Chrom, Epith

You can get the optimal variables:

bayesProfile$optVariables
[1] "USize" "UShap" "BNucl" "Chrom" "Epith" "Thick"