Search code examples
rsurvivalsurvminer

Problems with creating a survivalobject in R using the `survival` package


I am currently working on a Kaplan-Meier-Plot in R using the survival package and I am planning on creating and designing the plot using the survminer package afterwards.

The data I have is from an excel table: data <- read_excel("worksheet".xlsx", na = "NA", sheet = 1) is what I used to import the data. The structure of the data (I manipulated it a bit before) is as following:

gropd_df [37 × 5] (S3: grouped_df/tbl_df/tbl/data.frame)
$ n : num [1:37] 1 1 1 1 15 40 5 6 1 311 ...
$ Rec : num [1:37] 0 0 0 0 0 1 0 0 0 105 ...
$ nonRec : num [1:37] 1 1 1 1 15 39 5 6 1 206 ...
$ time : num [1:37] 3 6 6 6 7.5 8 8 8.5 9 9.9...

I now want to create a Survivalobject using Surv() and then later the kaplan-meier estimate with survfit() from the Survivalobject: so <- Surv(data$time, data$Rec).

The variable n stands for the number of people observed in a study, time stands for the time after an operation (time = 0 on operations day) where the patients had a follow-up examination of their disease. If the disease did come back after the operation (determined in the follow-up examination), the patient was counted in Rec, if the disease did not reoccur, he/she was counted as nonRec.

As you can see, the events are not displayed as time and status with status being "0" or "1", but as time and number of people being "0" (nonRec) and number of people being "1" (Rec). Therefore, creating the survivalobject above does (obviously) not work out how I want:

so <- Surv(data$time, data$Rec)
Warning message:
In Surv(data$time, data$Rec) : Invalid status value, converted to NA

Does anyone know a smart solution to solve this? Changing the data manually does not work out given the size of the data. Also, I am sorry if this question may seem dumb to you but I startet using R just a few weeks ago and I`m by far not a pro in using it..
Help would be very welcome, thanks for reading :)

Update following @jpsmith comment (hope this is right):

dput(FUP)
structure(list(n = c(1, 1, 1, 1, 15, 40, 5, 6, 1, 311, 20, 7, 
5, 60, 25, 25, 62, 228, 1, 12, 29, 24, 86, 41, 200, 5, 3, 30, 
3, 30, 237, 41, 162, 10, 14, 27, 1), Rec = c(0, 0, 0, 0, 0, 1, 
0, 0, 0, 105, 1, 0, 2, 1, 0, 0, 6, 6, 0, 1, 9, 2, 15, 3, 22, 
0, 0, 1, 0, 4, 23, 10, 0, 0, 4, 2, 0), nonRec = c(1, 1, 1, 1, 
15, 39, 5, 6, 1, 206, 19, 7, 3, 59, 25, 25, 56, 222, 1, 11, 20, 
22, 71, 38, 178, 5, 3, 29, 3, 26, 214, 31, 162, 10, 10, 25, 1
), fuptime = c(3, 6, 6, 6, 7.5, 8, 8, 8.5, 9, 9.9, 10, 10, 12, 
12, 12, 12, 12, 12, 12, 12, 12.3, 13.9, 14, 15.2, 18, 21, 24.2, 
25, 28, 30, 30, 35.3, 36, 36, 36, 48.6, 66), operation3 = c("12-laser", 
"12-laser", "12-laser", "12-laser", "12-laser", "12-laser", "12-laser", 
"12-laser", "12-laser", "12-laser", "12-laser", "12-laser", "12-laser", 
"12-laser", "12-laser", "12-laser", "12-laser", "12-laser", "12-laser", 
"12-laser", "12-laser", "12-laser", "12-laser", "12-laser", "12-laser", 
"12-laser", "12-laser", "12-laser", "12-laser", "12-laser", "12-laser", 
"12-laser", "12-laser", "12-laser", "12-laser", "12-laser", "12-laser"
)), class = c("grouped_df", "tbl_df", "tbl", "data.frame"), row.names = c(NA, 
-37L), groups = structure(list(fuptime = c(3, 6, 7.5, 8, 8.5, 
9, 9.9, 10, 12, 12.3, 13.9, 14, 15.2, 18, 21, 24.2, 25, 28, 30, 
35.3, 36, 48.6, 66), .rows = structure(list(1L, 2:4, 5L, 6:7, 
    8L, 9L, 10L, 11:12, 13:20, 21L, 22L, 23L, 24L, 25L, 26L, 
    27L, 28L, 29L, 30:31, 32L, 33:35, 36L, 37L), ptype = integer(0), class = c("vctrs_list_of", 
"vctrs_vctr", "list"))), class = c("tbl_df", "tbl", "data.frame"
), row.names = c(NA, -23L), .drop = TRUE))

Solution

  • As I said in my comment, uncount() is a good way to convert your data to one-observation-per-participant. For example:

    library(tidyverse)
    
    fup1 <- fup %>% 
      uncount(n) %>% 
      mutate(Status = ifelse(Rec == 0, 0, 1)) %>% 
      select(-Rec, -nonRec) %>%
      ungroup() 
    
    fup1
    # A tibble: 1,770 × 3
       fuptime operation3 Status
         <dbl> <chr>       <dbl>
     1     3   12-laser        0
     2     6   12-laser        0
     3     6   12-laser        0
     4     6   12-laser        0
     5     7.5 12-laser        0
     6     7.5 12-laser        0
     7     7.5 12-laser        0
     8     7.5 12-laser        0
     9     7.5 12-laser        0
    10     7.5 12-laser        0
    # … with 1,760 more rows
    

    Note that I've converted your Rec and nonRec columns to a single indicator column, Status.

    Now it's easy to fit your Kaplan-Meier model:

    library(survival)
    
    survfit(Surv(fuptime, Status) ~ operation3, data=fup1)
    Call: survfit(formula = Surv(fuptime, Status) ~ operation3, data = fup1)
    
            n events median 0.95LCL 0.95UCL
    [1,] 1770   1497     14      14    15.2