I need to perform a univariate logistic regression on all variables in my dataframe. I have 166 variables, and I have been trying to use lapply to simplify this process. However, I keep getting the error:
> lapply(data$Gates, FUN=Lmodel)
Error in model.frame.default(formula = Sstatus ~ x, data = data,
na.action = na.exclude, :
variable lengths differ (found for 'x')
I built the function Lmodel like so:
Lmodel<-function(x){
(glm(Sstatus~x, data=data, family="binomial"))
}
The function works when not used in conjunction with lapply:
> Lmodel(data$Gates)
Call: glm(formula = Sstatus ~ x, family = "binomial", data = data,
na.action = na.exclude)
Coefficients:
(Intercept) xy
2.5986 -0.6527
Degrees of Freedom: 169 Total (i.e. Null); 168 Residual
(8 observations deleted due to missingness)
Null Deviance: 96.72
Residual Deviance: 95.57 AIC: 99.57
My dependent variable Sstatus does contain some missing values, and I am thinking this is where my problem is. However, I don't understand why the function works on its own, but not when used with lapply. How can I fix this?
If you want to use lapply you have to make a list where its element of the list is the vector that includes the observations of an independent variable. For example you have a dataset with variables Sstatus, indepen1, indepen2, indepent3.
# make a list
list.of.indepent <- vector("list", 3)
list.of.indepent[[1]] <- indepen1
etc
Then
lapply(list.of.indepent, FUN=Lmodel)
should work.
You might need to edit your Lmodel function as follows
Lmodel<-function(x){
(glm(data$Sstatus~x, family="binomial"))
}