I'm trying to use glmnet
to fit a GLM that has a proportional response variable (using the family="binomial"
).
The help file for glmnet
says that the response variable:
"For family="binomial" should be either a factor with two levels, or a two-column matrix of counts or proportions (the second column is treated as the target class"
But I don't really understand how I would have a two column matrix. My variable is currently just a single column with values between 0 and 1. Can someone help me figure out how this needs to be formatted so that glmnet
will run it properly? Also, can you explain what the target class means?
It is a matrix of positive label and negative label counts, for example in the example below we fit a model for proportion of Claims
among Holders
:
data = MASS::Insurance
y_counts = cbind(data$Holders - data$Claims,data$Claims)
x = model.matrix(~District+Age+Group,data=data)
fit1 = glmnet(x=x,y=y_counts,family="binomial",lambda=0.001)
If possible, so you should go back to before your calculation of the response variable and retrieve these counts. If that is not possible, you can provide a matrix of proportion, 2nd column for success but this assumes the weight or n is same for all observations:
y_prop = y_counts / rowSums(y_counts)
fit2 = glmnet(x=x,y=y_prop,family="binomial",lambda=0.001)