I'm using the Boston Housing data set from the MASS package, and working with splines from the gam package in R. However, an error is returned with this code:
library(gam)
library(MASS)
library(tidyverse)
Boston.gam <- gam(medv ~ s(crim) + s(zn) + s(indus) + s(nox) + s(rm) + s(age) + s(dis) + s(rad) + s(tax) + s(ptratio) + s(black) + s(lstat), data = Boston)
The error message is:
A smoothing variable encountered with 3 or less unique values; at least 4 needed
The variable that is causing the issue is chas, it only has two values, 1 and 0.
What is a test to determine if a column has 3 or fewer unique values so it can be eliminated from the spline analysis?
Would this work?
You can use dplyr::n_distinct()
to perform the unique check.
# Number of unique values
n_unique_vals <- map_dbl(Boston, n_distinct)
# Names of columns with >= 4 unique vals
keep <- names(n_unique_vals)[n_unique_vals >= 4]
# Model data
gam_data <- Boston %>%
dplyr::select(all_of(keep))