Auto cleaning functions before modelling in R

Perhaps this is a dumb question but I am a new convert from SAS and I am still figuring my way around. What is the easiest way to clean a data set before running models. Eg: I have a dataset with a 100 variables. How can I remove character/factor variables with less than 2 levels before running a model? This seems to occur on the fly in SAS and I find it a pain to manually drop variables in R before modelling. Surely there should be a better way. Thanks in advance.

Solution

You could try: (modification of @Richard Scriven't answer)

indx <- sapply(dat, function(x) length(levels(x))<2 & is.factor(x))
dat1 <- dat[,!indx]
head(dat1)
#       Col1 Col3
#1  1.3709584    B
#2 -0.5646982    B
#3  0.3631284    B
#4  0.6328626    D
#5  0.4042683    A
#6 -0.1061245    D

If you have both character and factor columns and want to remove those columns with <2 unique levels/values

dat$Col4 <- as.character(dat$Col4)

If I try the above code, I would get (which is wrong):

 head(dat[,!indx],2)
 #        Col1 Col3 Col4
 #1  1.3709584    B  Yes
 #2 -0.5646982    B  Yes

Here, you could do:

indx1 <- sapply(dat, function(x) !is.numeric(x) & length(unique(x))<2)
head(dat[,!indx1])
  #       Col1 Col3
  #1  1.3709584    B
  #2 -0.5646982    B
  #3  0.3631284    B
  #4  0.6328626    D
  #5  0.4042683    A
  #6 -0.1061245    D

data

set.seed(42)
 dat <- data.frame(Col1=rnorm(25), Col2=LETTERS[1], 
     Col3=sample(LETTERS[1:5], 25, replace=TRUE), Col4="Yes")