In glm()
it is possible to model bernoulli [0,1] outcomes with a logistic regression using the following sort of syntax.
glm(bin ~ x, df, family = "binomial")
However you can also perform aggregated binomial regression, where each observation represents a count of target events from a certain fixed number of bernoulli trials. For example see the following data:
set.seed(1)
n <- 50
cov <- 10
x <- c(rep(0,n/2), rep(1, n/2))
p <- 0.4 + 0.2*x
y <- rbinom(n, cov, p)
With these sort of data you use slightly different syntax in glm()
mod <- glm(cbind(y, cov-y) ~ x, family="binomial")
mod
# output
# Call: glm(formula = cbind(y, cov - y) ~ x, family = "binomial")
#
# Coefficients:
# (Intercept) x
# -0.3064 0.6786
#
# Degrees of Freedom: 49 Total (i.e. Null); 48 Residual
# Null Deviance: 53.72
# Residual Deviance: 39.54 AIC: 178
I was wondering is it possible to model this type of aggregated binomial data in the glmnet package? If so, what is the syntax?
Yes you can do it as the following
set.seed(1)
n <- 50
cov <- 10
x <- c(rep(0,n/2), rep(1, n/2))
x = cbind(x, xx = c(rep(0.5,20), rep(0.7, 20), rep(1,10)))
p <- 0.4 + 0.2*x
y <- rbinom(n, cov, p)
I added another covariate here called xx as glmnet accepts minimum of two covariates
In glm as you have it in your post
mod <- glm(cbind(y, cov-y) ~ x, family="binomial")
mod
# output
# Call: glm(formula = cbind(y, cov - y) ~ x, family = "binomial")
# Coefficients:
# (Intercept) xx xxx
# 0.04366 0.86126 -0.64862
# Degrees of Freedom: 49 Total (i.e. Null); 47 Residual
# Null Deviance: 53.72
# Residual Deviance: 38.82 AIC: 179.3
In glmnet, without regularization (lambda=0) to reproduce similar results as in glm
library(glmnet)
fit = glmnet(x, cbind(cov-y,y), family="binomial", lambda=0)
coef(fit)
# output
# 3 x 1 sparse Matrix of class "dgCMatrix"
# s0
# (Intercept) 0.04352689
# x 0.86111234
# xx -0.64831806