I want to perform a survival analysis which includes time-varying covariates, using the aalen()
function from an R package called timereg
. However, I am still confused as to how the data should be presented in a dataframe, and how the model formula should be specified.
Here's a made up data set:
subject_id survival_time weight height outcome_indicator
1 3 65 1.8 0
1 4 68 1.8 0
1 7 70 1.8 1
2 2 55 1.6 0
2 9 53 1.6 0
3 2 62 1.7 0
3 3 65 1.7 0
3 5 64 1.7 0
3 6 66 1.7 0
And here are some interpretations:
subject_id
variable, and they were followed up for 3, 2, 4 times, respectively.weight
is a time-varying covariate.height
is independent of time and so for each subject, it remained the same at each follow up.survival_time
is in years, then the interested event happened to subject 1 at year 7.survival_time
.Finally, a list of my questions (please don't hesitate to leave a comment even if you don't have all the answers, or if my solution is correct):
aalen
model (or any other model that includes time-varying covariates)? Is it something like:aalen(formula = Survf(survival_time, outcome_indicator) ~ const(height) + weight, data = data_set, id = data_set$subject_id)
where the Survf()
function is used to combine the two outcome-related variables; const()
is used to denote time-varying covariates, leaving other covariates as they are; data_set
is the name of the dataframe; and the id
parameter is used to associate different rows of the same subject?
This is likely not the right way to represent these data. Judging from the ordering of the variable survival_time
, these are the cohort times at which the covariate changes. You need a lagged event time to indicate the "start" of observation, set to 0 for the first patient record. The way you have format the data now have squared the denominator time, reduced the incidence, and attenuated the hazard ratios toward the null.
Take the first participant: they are in fact observed from 0 to 7. The first record is 0 to 3, the next: 3 to 4, the last 4 to 7. Where have you told R this explicitly? R does not know these records belong to the same individual. R now believes there are 3 people followed for a cumulative of 3 + 4 + 7 = 14 years having 1 event rather than 7 years having 1 event (incidence goes from 14 ppy to 7 ppy).