OK, so I have a dataframe that looks like this:
head(exprs, 21)
sample expr ID X_OS
1 BIX high TCGA_DM_A28E_01 26
2 BIX high TCGA_AY_6197_01 88
3 BIX high TCGA_HB_KH8H_01 553
4 BIX low TCGA_K4_6303_01 256
5 BIX low TCGA_F4_6703_01 491
6 BIX low TCGA_Y7_PIK2_01 177
7 BIX low TCGA_A6_5657_01 732
8 HEF high TCGA_DM_A28E_01 26
9 HEF high TCGA_AY_6197_01 88
10 HEF high TCGA_F4_6703_01 491
11 HEF high TCGA_HB_KH8H_01 553
12 HEF low TCGA_K4_6303_01 256
13 HEF low TCGA_Y7_PIK2_01 177
14 HEF low TCGA_A6_5657_01 732
15 TUR high TCGA_DM_A28E_01 26
16 TUR high TCGA_F4_6703_01 491
17 TUR high TCGA_Y7_PIK2_01 177
18 TUR low TCGA_K4_6303_01 256
19 TUR low TCGA_AY_6197_01 88
20 TUR low TCGA_HB_KH8H_01 553
21 TUR low TCGA_A6_5657_01 732
Simply, for each sample
, there are 7 patients, each with a survival time (X_OS
) and expression level high
or low
(expr
). In the code below, I wish to take the first sample and run it through the survdiff
function, with the outputs going to dfx
. However, I'm new to survival analysis and I'm not sure how to use the parameters of the survdiff
function. I wish to compare high
and low
expression groups for each sample
. How can I edit the function expfun
to yield the survdiff
output I need? In addition, ideally I'd love to get the pvalues out of it, but I can work on that in a later step. Thank you!
expfun = function(x) {
survdiff(Surv(x$X_OS, x$expr))
}
dfx <- pblapply(split(exprs[c("expr", "X_OS")], exprs$sample), expfun)
Try this. I added a proper Surv() call because you only had times and no status argument and I made it into a formula (with the predictor on the other side of the tilde) because Surv
function expects status as its second argument and survdiff
expects a formula as its first argument. That means you need to use the regular R regression calling convention where column names are used as the formula tokens and the dataframe is given to the data argument. If you had a censoring variable, it would be put in as the second Surv argument rather than the 1
's that I have in there now.
expfun = function(x) {
survdiff( Surv( X_OS, rep(1,nrow(x)) ) ~ expr, data=x)
}
dfx <- lapply(split(exprs[c("expr", "X_OS")], exprs$sample), expfun)
This is the result from print.survdiff:
> dfx
$BIX
Call:
survdiff(formula = Surv(X_OS, rep(1, nrow(x))) ~ expr, data = x)
N Observed Expected (O-E)^2/E (O-E)^2/V
expr=high 3 3 2.05 0.446 0.708
expr=low 4 4 4.95 0.184 0.708
Chisq= 0.7 on 1 degrees of freedom, p= 0.4
$HEF
Call:
survdiff(formula = Surv(X_OS, rep(1, nrow(x))) ~ expr, data = x)
N Observed Expected (O-E)^2/E (O-E)^2/V
expr=high 4 4 3.14 0.237 0.51
expr=low 3 3 3.86 0.192 0.51
Chisq= 0.5 on 1 degrees of freedom, p= 0.475
$TUR
Call:
survdiff(formula = Surv(X_OS, rep(1, nrow(x))) ~ expr, data = x)
N Observed Expected (O-E)^2/E (O-E)^2/V
expr=high 3 3 1.75 0.902 1.41
expr=low 4 4 5.25 0.300 1.41
Chisq= 1.4 on 1 degrees of freedom, p= 0.235
Note that you can see the code to produce the print output with:
getAnywhere(print.survdiff)