As a personal exercise, I was wondering how to generate a column for the number at risk, or the number of observations that have not yet experienced the event at time t.
Here's some sample data:
df <- tibble(
event = c(1,1,1,0,0),
time = c(10, 20, 30, 40, 50)
)
df
The desired output should look like:
# A tibble: 5 x 3
event time nrisk
<dbl> <dbl> <dbl>
1 1 10 4
2 1 20 3
3 1 30 2
4 0 40 2
5 0 50 2
If every row is an individual you could subtract number of rows in the dataframe with cumulative sum of event
.
df$n_risk <- nrow(df) - cumsum(df$event)
df
# event time n_risk
# <dbl> <dbl> <dbl>
#1 1 10 4
#2 1 20 3
#3 1 30 2
#4 0 40 2
#5 0 50 2