Search code examples
rsparse-matrixr-packagelasso-regression

How to develop a logistic regression model using SGL package?


I am currently working with a dataset that has a very large number of variables. Therefore, I decided to use the sparse group LASSO variable selection technique, implemented on the SGL package.

My problem is a logistic regression problem, which is one of the possible models to build using this package. However, when I try to use it, I get an error message. My data frame is called N, and my binary vector is called y:

> x <- as.matrix(N)
> y <- as.matrix(Y)
> data <- list(x, y=y)
> sgl_small <- cvSGL(data, groups, type="logit")

Error: NA/NaN/Inf in foreign function call (arg 1)

In the situation before, Y was a binary numeric vector of zeros and ones, so I thought that the problem would be that Y was not a factor, so I tried another time:

> x <- as.matrix(N)
> y <- as.factor(Y))
> data <- list(x, y=y)
> sgl_small <- cvSGL(data, groups, type="logit")

Error in seq.default(log(max.lam), 
  log(min.lam), (log(min.lam) -   log(max.lam))/(nlam -  : 
'from' cannot be NA, NaN or infinite
 In addition: Warning messages:
 1: In mean.default(y) : argument is not numeric or logical: returning NA
 2: In mean.default(y) : argument is not numeric or logical: returning NA
 3: In Ops.factor(y, m.y) : '-' not meaningful for factors

So this error message seems to indicate that y should not be a factor. I don't know what is going wrong, specially because if I run the cvSGL function considering y as a numeric binary vector, but I build a linear model rather than a logit model (although a linear model is not meaingful for me), it actually works and does not give any error.

I am referring to apply this:

> y <- as.matrix(Y)
> data <- list(x, y=y)
> sgl_small <- cvSGL(data, groups, type="linear")

I would thank any help, if anyone else have tried to use this package to build a logit model.


Solution

  • I found this example on the help page of cvSGL

    set.seed(1)
    n = 50; p = 10;
    X = matrix(rnorm(n * p), ncol = p, nrow = n)
    beta = (-2:2)
    y = sample(c(0,1),50, replace = T)
    data = list(x = X, y = y)
    cvFit = cvSGL(data, type = "logit")
    

    As you can see, the parameter "index" (you called it groups) wasn't used in this situation. I don't see how you defined the index in your case. I guess the problem is that you need to define the name of your elements list

    data <- list(x = x, y=y)