Search code examples
rfunctionglm

calling the glm() function within a user-defined function


I have been trying to create a function that uses a glm() inside it. But I always get an error message. It looks like the function does not retrieve the value of the variable.

set.seed(234)
sex <- sample(c("M", "F"), size=100, replace=TRUE)
age <- rnorm(n=100, mean=20 + 4*(sex=="F"), sd=0.1)
dsn <- data.frame(sex, age)
rm(sex, age) #remove sex and age from the global environment for reproducibility

to_analyze <- function(dep, indep, data){
  glm(dep~factor(indep), data=data)
}

to_analyze(dep=age, indep=sex, data=dsn)
#> Error in eval(predvars, data, env): object 'age' not found



Solution

  • You could use any of the following:

    Using substitute:

    to_analyze <- function(dep, indep, data){
      glm(substitute(dep ~ factor(indep)), data=data)
    }
    
    to_analyze(dep=age, indep=sex, data=dsn)
    

    Advantage: Can write the independent as a formula.

    eg

     to_analyze(Petal.Width, Sepal.Length + Sepal.Width, data = iris)
    

    Using reformulate as stated by @NelsonGon

    to_analyze <- function(dep, indep, data){ 
      glm(reformulate(sprintf("factor(%s)",indep), dep),  data = data) 
     }
    

    Note that to call this function, the variables aught to be of type character

     to_analyze(dep= "age", indep="sex", data=dsn)
    

    Recall glm can also take a string that can be parsed to a formula:

    to_analyze <- function(dep, indep, data){ 
      glm(sprintf("%s~factor(%s)", dep, indep),  data = data) 
    }
    
    to_analyze("age", "sex", data=dsn)
    

    or even:

    to_analyze <- function(dep, indep, data){ 
      glm(paste(dep,"~ factor(",indep,")"),  data = data) 
    }
    
    to_analyze("age", "sex", data=dsn)
    

    LASTLY: to combine both the substitute and paste:

    to_analyze <- function(dep, indep, data){ 
      glm(paste(substitute(dep),"~ factor(",substitute(indep),")"),  data = data) 
    }
    

    will work for both symbols and characters. eg:

    to_analyze(age, sex, data=dsn)
    to_analyze("age", "sex", data=dsn)