I manage an internal code base that relies heavily upon the glmnet
package. Upon upgrading to the newest version (v3.0.2
) my unit tests started failing for the coefficients of a Cox model. The previous version of glmnet
was v2.0.16 (R 3.5.2)
. I am now running R v3.6.2
.
I have noticed that there is a new relax =
argument that appears to use un-regularized fits in the path and I'd imagine this could cause a slight difference in the fits, however the default is relax = FALSE
, so I doubt that is the issue.
Below is a reprex
based on the mtcars
dataset, fitting 2 randomly chosen features and renaming two variables to time
and status
so as to allow fitting of a Cox model. A proper reprex comparison is difficult as it would require different R installations, but this should allow anyone to reproduce the issue.
library(magrittr)
library(dplyr)
library(glmnet)
dat <- mtcars %>%
select(mpg, disp, status = vs, time = hp) %>% # select 2 features; assign time & status
mutate_at(1:2, ~ {
log10(.x) %>% subtract(mean(.)) %>% divide_by(sd(.)) # center & scale
}) %>% as.matrix()
glmnet(dat[, 1:2], dat[, 3:4], family = "cox", lambda = 0)$beta # fit model
The result for v3.0.2
is:
#> 2 x 1 sparse Matrix of class "dgCMatrix"
#> s0
#> mpg 0.2293535
#> disp -1.8160387
The result for v2.0.16
is:
#> 2 x 1 sparse Matrix of class "dgCMatrix"
#> s0
#> mpg 0.2154324
#> disp -1.8172714
Have others noticed similar discrepancies? I am somewhat surprised not to have found anyone else bumping into this same issue. Am I going to have to update all my unit tests :(
Insights and/or explanations greatly appreciated. Thanks in advance.
Slightly too long for a comment:
devtools::install_version()
, see below).coxnet
(presumably the internal function called for family="cox"
:
- 2.0-20:
- Fixed a bug in internal function coxnet.deviance to do with input pred, as well as saturated loglike (missing) and weights
- added a coxgrad function for computing the gradient
- 2.0-19: Fixed a bug in coxnet to do with ties between death set and risk set
devtools::install_version("glmnet",version=...,lib=<version-specific>)
to install every version from 2.0-16 to 3.0-2 inclusive, each in a separate library, to make it easy (via library("glmnet", lib.loc=...
) to load different package versions and bisect to find the specific change. (The intermediate versions were unreleased, so you'll be jumping from 2.0-18 to 3.0.)
I'm guessing that one of those coxnet
bug fixes is (intentionally or as a side effect) responsible for the changes.
If it were in an accessible git repository you could use git bisect
with a local copy to automate the process (maybe not worth it for such a small number of changepoints, but it doesn't look like the development tree is available: there's a nice pkgdown website but I don't see any links to a version control system.
If you have a lot of time on your hands you can download all of the archived tarballs and hunt through them for changes ...