Search code examples
rsurvival-analysiscox-regression

R time at risk for each group


I have been preparing survival analysis and cox regression in R. However, my line manager is a Stata user and wants the output displayed in a similar way as Stata would display it, e.g.

# Stata code
. strate
. stsum, by (GROUP)

stsum will output a time at risk for each group and an incidence rate, and I can't figure out how to achieve this with R.


The data look roughly like this (I can't get to it as it's in a secure environment):

PERS GROUP INJURY FOLLOWUP
111  1     0      2190
222  2     1      45
333  1     1      560
444  2     0      1200

So far I have been using fairly bog standard code:

library(survival)
library(coin)
# survival analysis
table(data$INJURY, data$GROUP)
survdiff(Surv(FOLLOWUP, INJURY)~GROUP, data=data)
surv_test(Surv(FOLLOWUP, INJURY)~factor(GROUP), data=data)
surv.all <- survfit(Surv(FOLLOWUP, INJURY)~GROUP, data=data)
print(sur.all, print.rmean=TRUE)
# cox regression
cox.all<- coxph(Surv(FOLLOWUP, INJURY)~GROUP, data=data))
summary(cox.all)

Solution

  • At the the moment we have 4 lines of data and no clear description (at least to a non-user of Stata) of the desired output:

    dat <- read.table(text="PERS GROUP INJURY FOLLOWUP
    111  1     0      2190
    222  2     1      45
    333  1     1      560
    444  2     0      1200",header=TRUE)
    

    I do not know if there are functions in either the coin or the survival packages that deliver a crude event rate for such data. It is trivial to deliver crude event rates (using 'crude' in the technical sense with no disparagement intended) with ordinary R functions:

     by(dat, dat$GROUP, function(d) sum(d$INJURY)/sum(d$FOLLOWUP) )
    #----------------
    dat$GROUP: 1
    [1] 0.0003636364
    ------------------------------------------------------ 
    dat$GROUP: 2
    [1] 0.0008032129
    

    The corresponding function for time at risk (or both printed to the console) would be very a simple modification. It's possible that the 'Epi' or 'epiR' package or one of the other packages devoted to teaching basic epidemiology would have designed functions for this. The 'survival' and 'coin' authors may not have seen a need to write up and document such a simple function.

    When I needed to aggregate the ratios of actual to expected events within strata of factor covariates, I needed to construct a function that properly created the stratified tables of events (to support confidence estimates), sums of "expecteds" (calculated on basis of age,gender and duration of observation), and divide actual A/E ratios. I assemble them into a list object and round the ratios to 2 decimal places. When I got it finished, I found these most useful as a sensibility check against the results I was getting with the 'survival' and 'rms' regression methods I was using. They also help explain results to a nonstatistical audience that is more familiar with tabular methods than with regression. I now have it as part of my Startup .profile.