I'm trying to use the survival package in R on a large dataset. When I try to enter lots of variables into the ridge function, I get an error. Strangely enough, this depends on the lengths of the variable names, but even with very short variable names (i.e. X1..X200), I can't have more than about 100 variables entered until I get this strange error message:
Error in if (any(ord > 1)) stop("Penalty terms cannot be in an interaction") : missing value where TRUE/FALSE needed
Here is an code example which will generate this error:
library(survival)
# Create a test data frame with random data (200 predictors)
test.data <-data.frame(outcome=rbinom(1000,1,0.1),
time=runif(1000,0,1000),replicate(200,rnorm(1000)))
# Create a string with ridge regression formula for 100 predictors
ridge.formula.100 <- paste0("Surv(time,outcome) ~ ridge(",
paste(paste0("X",1:100),collapse=","),",theta=1)")
# Run ridge regression with 100 predictors
m1 <- coxph(as.formula(ridge.formula.100),data=test.data)
summary(m1) # Yay it works!
# Create a string with ridge regression formula for 200 predictors
ridge.formula.120 <- paste0("Surv(time,outcome) ~ ridge(",
paste(paste0("X",1:120),collapse=","),",theta=1)")
# Run ridge regression with 120 predictors
m2 <- coxph(as.formula(ridge.formula.120),data=test.data) # Gives error
# Fails with error above
Any hints as to what I'm doing wrong? Importantly, if the variable names are longer, even fewer variables can be entered into the ridge.
Thanks much!
Try putting all the variables into a matrix allvars <- as.matrix(test.data[,3:ncol(test.data)])
then use this in your formula ridge.formula <- as.formula(paste("Surv(time,outcome) ~ ridge(allvars,theta=1)"))
. Now the call m2 = coxph(ridge.formula,data=test.data)
does not give that error.