I'm trying to apply a multivariate Cox regression analysis in R
to my dataset, following this tutorial.
In particular, I am trying to apply the following function coxph()
:
install.packages(c("survival", "survminer"));
library("survival");
library("survminer");
data("lung");
res.cox <- coxph(Surv(time, status) ~ age + sex + ph.ecog, data = lung)
summary(res.cox)
As you can see, in this case the names of the features (age + sex + ph.ecog
) have been inserted manually in the formula.
In my case, instead, I have thousands of features, so I cannot add their names manually. I need to find a way to insert them in an automated way. I tried to do it on the previous case, with no success. Here's what I tried:
featureNames <- paste(colnames(lung), collapse = " + ")
res.cox <- coxph(Surv(time, status) ~ featureNames, data = lung)
And I got this error message:
Error in model.frame.default(formula = Surv(time, status) ~ featureNames, :
variable lengths differ (found for 'featureNames')
Can someone help me? Thanks!
I'm using R
version 3.6.3 on a pc running Linux Ubuntu 18.04.5 LTS/
Use reformulate, first set up a default formula:
fS <- Surv(time, status) ~ .
Let's say you know before hand the features:
colnames(lung)
[1] "inst" "time" "status" "age" "sex" "ph.ecog"
[7] "ph.karno" "pat.karno" "meal.cal" "wt.loss"
features = c("ph.karno","age","meal.cal","wt.loss")
fs = reformulate(features, fS[[2]])
coxph(fs, data = lung)
Call:
coxph(formula = fs, data = lung)
coef exp(coef) se(coef) z p
ph.karno -9.152e-03 9.909e-01 7.327e-03 -1.249 0.212
age 1.629e-02 1.016e+00 1.168e-02 1.395 0.163
meal.cal 5.087e-06 1.000e+00 2.391e-04 0.021 0.983
wt.loss -1.057e-03 9.989e-01 6.884e-03 -0.154 0.878
Likelihood ratio test=5.84 on 4 df, p=0.2113
n= 171, number of events= 124
(57 observations deleted due to missingness)