Search code examples
rsurvival-analysis

Manual calculation of the Kaplan-Meier estimator


I thought this would be a trivial thing to do but I still have some trouble adjusting to writing code instead of pointing and clicking on a spreadsheet.

month = as.integer(c(1,2,3,4,5,6,7,8,9,10,11,12))
remaining = c(1000,925,852,790,711,658,601,567,530,501,485,466)
left = c(75, 73, 62, 79, 53, 57, 34, 37, 29, 16, 19, 0)
KPdata = data.frame(month, remaining, left)

> KPdata
   month remaining left    
1      1      1000   75 
2      2       925   73 
3      3       852   62 
4      4       790   79 
5      5       711   53 
6      6       658   57 
7      7       601   34
8      8       567   37 
9      9       530   29 
10    10       501   16 
11    11       485   19 
12    12       466   12 

How do I calculate the Kaplan-Meier survival function at each month? Note that I want to do this manually, I am aware that there are packages which will do it for me.


Solution

  • I think this is what you're trying to do. We use lag and cumprod to get a manual KM estimator:

    KPdata$KM_init <- lag((KPdata$remaining - KPdata$left) / KPdata$remaining)
    KPdata[1,ncol(KPdata)] <- 1
    KPdata$KM_final <- cumprod(KPdata$KM_init)
    
    KPdata
       month remaining left   KM_init KM_final
    1      1      1000   75 1.0000000    1.000
    2      2       925   73 0.9250000    0.925
    3      3       852   62 0.9210811    0.852
    4      4       790   79 0.9272300    0.790
    5      5       711   53 0.9000000    0.711
    6      6       658   57 0.9254571    0.658
    7      7       601   34 0.9133739    0.601
    8      8       567   37 0.9434276    0.567
    9      9       530   29 0.9347443    0.530
    10    10       501   16 0.9452830    0.501
    11    11       485   19 0.9680639    0.485
    12    12       466    0 0.9608247    0.466
    

    Alternatively, I think there's a different form of a KM estimator that looks like this (note that I've added a row corresponding to month = 0):

    month = as.integer(c(0,1,2,3,4,5,6,7,8,9,10,11,12))
    remaining = c(1000,1000,925,852,790,711,658,601,567,530,501,485,466)
    left = c(0,75, 73, 62, 79, 53, 57, 34, 37, 29, 16, 19, 0)
    KPdata2 = data.frame(month, remaining, left)
    
    KPdata2$KM_init <- (KPdata2$remaining - KPdata2$left) / KPdata2$remaining
    KPdata2$KM_final <- cumprod(KPdata2$KM_init)
    
    KPdata2
       month remaining left   KM_init KM_final
    1      0      1000    0 1.0000000    1.000
    2      1      1000   75 0.9250000    0.925
    3      2       925   73 0.9210811    0.852
    4      3       852   62 0.9272300    0.790
    5      4       790   79 0.9000000    0.711
    6      5       711   53 0.9254571    0.658
    7      6       658   57 0.9133739    0.601
    8      7       601   34 0.9434276    0.567
    9      8       567   37 0.9347443    0.530
    10     9       530   29 0.9452830    0.501
    11    10       501   16 0.9680639    0.485
    12    11       485   19 0.9608247    0.466
    13    12       466    0 1.0000000    0.466