r logistic-regression glmnet lasso-regression

Aggregated logistic lasso regression in glmnet

In glm() it is possible to model bernoulli [0,1] outcomes with a logistic regression using the following sort of syntax.

glm(bin ~ x, df, family = "binomial")

However you can also perform aggregated binomial regression, where each observation represents a count of target events from a certain fixed number of bernoulli trials. For example see the following data:

set.seed(1)
n <- 50
cov <- 10
x <- c(rep(0,n/2), rep(1, n/2))
p <- 0.4 + 0.2*x
y <- rbinom(n, cov, p)

With these sort of data you use slightly different syntax in glm()

mod <- glm(cbind(y, cov-y) ~ x, family="binomial")
mod

# output

# Call:  glm(formula = cbind(y, cov - y) ~ x, family = "binomial")
# 
# Coefficients:
#   (Intercept)            x  
# -0.3064       0.6786  
# 
# Degrees of Freedom: 49 Total (i.e. Null);  48 Residual
# Null Deviance:        53.72 
# Residual Deviance: 39.54  AIC: 178

I was wondering is it possible to model this type of aggregated binomial data in the glmnet package? If so, what is the syntax?

Solution

Yes you can do it as the following

set.seed(1)
n <- 50
cov <- 10
x <- c(rep(0,n/2), rep(1, n/2))
x = cbind(x, xx = c(rep(0.5,20), rep(0.7, 20), rep(1,10)))
p <- 0.4 + 0.2*x
y <- rbinom(n, cov, p)

I added another covariate here called xx as glmnet accepts minimum of two covariates

In glm as you have it in your post

mod <- glm(cbind(y, cov-y) ~ x, family="binomial")
mod

# output
# Call:  glm(formula = cbind(y, cov - y) ~ x, family = "binomial")

# Coefficients:
# (Intercept)           xx          xxx  
# 0.04366      0.86126     -0.64862  

# Degrees of Freedom: 49 Total (i.e. Null);  47 Residual
# Null Deviance:        53.72 
# Residual Deviance: 38.82  AIC: 179.3

In glmnet, without regularization (lambda=0) to reproduce similar results as in glm

library(glmnet)
fit = glmnet(x, cbind(cov-y,y), family="binomial", lambda=0)
coef(fit)
# output
# 3 x 1 sparse Matrix of class "dgCMatrix"
#                     s0
# (Intercept)  0.04352689
# x            0.86111234
# xx          -0.64831806