Search code examples

What does a proportional matrix look like for glmnet response variable in R?

I'm trying to use glmnet to fit a GLM that has a proportional response variable (using the family="binomial").

The help file for glmnet says that the response variable:

"For family="binomial" should be either a factor with two levels, or a two-column matrix of counts or proportions (the second column is treated as the target class"

But I don't really understand how I would have a two column matrix. My variable is currently just a single column with values between 0 and 1. Can someone help me figure out how this needs to be formatted so that glmnet will run it properly? Also, can you explain what the target class means?


  • It is a matrix of positive label and negative label counts, for example in the example below we fit a model for proportion of Claims among Holders :

    data = MASS::Insurance
    y_counts = cbind(data$Holders - data$Claims,data$Claims)
    x = model.matrix(~District+Age+Group,data=data)
    fit1 = glmnet(x=x,y=y_counts,family="binomial",lambda=0.001)

    If possible, so you should go back to before your calculation of the response variable and retrieve these counts. If that is not possible, you can provide a matrix of proportion, 2nd column for success but this assumes the weight or n is same for all observations:

    y_prop = y_counts / rowSums(y_counts)
    fit2 = glmnet(x=x,y=y_prop,family="binomial",lambda=0.001)