Search code examples
rglmnet

What does a proportional matrix look like for glmnet response variable in R?


I'm trying to use glmnet to fit a GLM that has a proportional response variable (using the family="binomial").

The help file for glmnet says that the response variable:

"For family="binomial" should be either a factor with two levels, or a two-column matrix of counts or proportions (the second column is treated as the target class"

But I don't really understand how I would have a two column matrix. My variable is currently just a single column with values between 0 and 1. Can someone help me figure out how this needs to be formatted so that glmnet will run it properly? Also, can you explain what the target class means?


Solution

  • It is a matrix of positive label and negative label counts, for example in the example below we fit a model for proportion of Claims among Holders :

    data = MASS::Insurance
    y_counts = cbind(data$Holders - data$Claims,data$Claims)
    x = model.matrix(~District+Age+Group,data=data)
    
    fit1 = glmnet(x=x,y=y_counts,family="binomial",lambda=0.001)
    

    If possible, so you should go back to before your calculation of the response variable and retrieve these counts. If that is not possible, you can provide a matrix of proportion, 2nd column for success but this assumes the weight or n is same for all observations:

    y_prop = y_counts / rowSums(y_counts)
    fit2 = glmnet(x=x,y=y_prop,family="binomial",lambda=0.001)