Search code examples
rlapplyvariable-length

R Function works on its own, error when used within lapply


I need to perform a univariate logistic regression on all variables in my dataframe. I have 166 variables, and I have been trying to use lapply to simplify this process. However, I keep getting the error:

> lapply(data$Gates, FUN=Lmodel)

Error in model.frame.default(formula = Sstatus ~ x, data = data,
na.action = na.exclude,  : 
variable lengths differ (found for 'x') 

I built the function Lmodel like so:

Lmodel<-function(x){
(glm(Sstatus~x, data=data, family="binomial"))
}

The function works when not used in conjunction with lapply:

> Lmodel(data$Gates)

Call:  glm(formula = Sstatus ~ x, family = "binomial", data = data, 
na.action = na.exclude)

Coefficients:
(Intercept)           xy  
 2.5986      -0.6527  

Degrees of Freedom: 169 Total (i.e. Null);  168 Residual
(8 observations deleted due to missingness)
Null Deviance:      96.72 
Residual Deviance: 95.57    AIC: 99.57

My dependent variable Sstatus does contain some missing values, and I am thinking this is where my problem is. However, I don't understand why the function works on its own, but not when used with lapply. How can I fix this?


Solution

  • If you want to use lapply you have to make a list where its element of the list is the vector that includes the observations of an independent variable. For example you have a dataset with variables Sstatus, indepen1, indepen2, indepent3.

    # make a list
    list.of.indepent <- vector("list", 3)
    list.of.indepent[[1]] <- indepen1
    etc 
    

    Then

    lapply(list.of.indepent, FUN=Lmodel) 
    

    should work.

    You might need to edit your Lmodel function as follows

    Lmodel<-function(x){
    (glm(data$Sstatus~x, family="binomial"))
    }