Search code examples
rglm

User-defined function to iterate through factor levels in a regression


I am a beginner in R so I'm sorry if my question is basic and has been answered somewhere else but unfortunately I could not find the answer.

One of my predictor variables, nationality, has 8 levels. I want to create a user defined function that loops through each level in my variable nationality, taking one level per regression. I created a list of the levels of the variable nationalityas such:

mylist <- list("bangladeshian", "british", "filipino", "indian",
               "indonesian", "nigerian", "pakistani", "spanish")

then created a user defined function:

f1 <- function(x) { 
  l <- summary(glm(smoke ~ I(nationality == mylist[x]),
                   data=df.subpop, family=binomial(link="probit")))
  print(l)
}

f1(2)

f1(2) gives this output:

Call:
glm(formula = smoke ~ I(nationality == mylist[x]), 
    family = binomial(link = "probit"), data = df.subpop)

Deviance Residuals: 
   Min      1Q  Median      3Q     Max  
-0.629  -0.629  -0.629  -0.629   1.853  

Coefficients:
                                Estimate Std. Error z value Pr(>|z|)    
(Intercept)                      -0.9173     0.1659  -5.530 3.21e-08 ***
I(nationality == mylist[x])TRUE  -4.2935   376.7536  -0.011    0.991    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 73.809  on 78  degrees of freedom
Residual deviance: 73.416  on 77  degrees of freedom
AIC: 77.416

Number of Fisher Scoring iterations: 14

As you can see, the coefficient for nationality is "I(nationality == mylist[x])TRUE" which is not very informative and requires the user to refer back to the line of code f1(2) and also to mylist to understand the level that that coefficient represents. I believe there should be a cleaner and more straightforward way to do this and accurately run a regression for each level without having to call f1() 8 times.


Solution

  • Consider dynamically building formula with as.formula or reformulate:

    nationality_levels <- levels(df.subpop$nationality)
    
    f1 <- function(x) { 
      # BUILD FORMULA (EQUIVALENT CALLS)
      f <- as.formula(paste0("smoke ~ I(nationality == '", x, "')"))
      f <- reformulate(paste0("I(nationality == '", x, "')"), "smoke")
    
      l <- summary(
        glm(f, data=df.subpop, family=binomial(link="probit"))
      )
    }
    
    reg_list <- lapply(nationality_levels, f1)
    reg_list