Search code examples
rformulalm

Checking R formulas for syntax and spelling errors before passing them to the lm function


I have a shiny app where the user can define the formula and then run the model. However, I want to make sure that the user has defined the formula correctly. The user will only enter the right hand side of the formula. So the left hand side will always be response ~ . How should I verify that the formula will not result in an error before passing it to the lm function?

set.seed(123)
df <- data.frame(response = rnorm(20,35, 12), Gender =sample(c("M","F"), 20, replace = T), Temp = rnorm(20,70, 5),
           Drug =sample(c("A","B","C"), 20, replace = T))

I want the check to correctly find wrong formulas on the right-hand side, like the ones below.

Syntax errors:

form <- "Drug + "
form <- "Drug + Gender + Gender(Drug"
form <- "+Drug+Gender"
form <- "Drug + Gender + Gender*Drug + "

spelling errors:

form <- "drug + Gender + Drug*Gender"

an any other possibilities

lm(data = df, formula = paste("response ~", form))

Solution

  • 1) Syntax check This will return TRUE if there is a syntax error and FALSE otherwise.

     form <- "Drug + "
     fo <- try(formula(paste("~", form)), silent = TRUE)
     inherits(fo, "try-error")
     ## [1] TRUE
    

    2) Valid variable check This returns the bad variables in badvars. If length(badvars) > 0 is TRUE then there are bad variables and otherwise there are none.

    form <- "Drug + Gender + XYZ"
    fo <- try(formula(paste("~", form)), silent = TRUE)
    
    # if fo passed syntax check then continue
    badvars <- setdiff(all.vars(fo), names(df))
    length(badvars) > 0
    ## [1] TRUE
    badvars
    ## [1] "XYZ"