Calculate the number of individual with more than i days of life with data.table in R

Here is my simplified data.table:

Individual	time_alive (day)
ID1	1
ID2	5
ID3	7
ID4	5

I need to calculate the number of individual alive at every day. I achieved this by doing a loop

for (i in c(-1:600)) {
  y<-summarise(DT , time_alive > i )
  Alive[i+2,]<-length(y[y==TRUE])
}

However this is really long with a data.frame of more than 2B observations.

I wanted to try an alternative with data.table but I am stuck at only 1 day number of alive calculation:

DT[,.N,time_alive> i][time_alive==TRUE,2]

Here, i cannot be replaced by a vector but only by 1 number. I want to calculate the number of individual with more than i days of life, without doing a loop.

My result expected for the simplified data would be:

Day	Number of individual alive
1	4
2	3
3	3
4	3
5	3
6	1
7	1
8	0

Best solution in one line, with data.table which is much faster than looping:

DT[, .(Day = seq_len(1 + max(time_alive)))][DT[,.(time_alive)], .(.N), on = .(Day <= time_alive), by = Day]

Solution

# @r2evans suggestion about making it a one-liner
# replaced res = data.table('day' = 1:max(DT$time_alive))
DT[, .(day = seq_len(1 + max(time_alive)))][
     # my original solution
     DT, .(.N) ,on = .(day <= time_alive),by = day, allow.cartesian = T]

# or 

DT[,time_alive > TARGET_NUMBER, by = individual]

I have two solutions based on what you have provided. One or both of them should be what you're looking for. See below for details/explanation

# load in data
DT = data.table('individual' = 1:4, 'time_alive' = c(1,5,7,5))
# set your target number
TARGET_NUMBER = 5

# group by individual, 
# then check if the number of days they were alive is greater than your target
# this answers "i want to calculate the number of 
# individual with more than "i" days of life

DT[,time_alive > TARGET_NUMBER, by = individual]

individual    V1
1:          1 FALSE
2:          2 FALSE
3:          3  TRUE
4:          4 FALSE

# if the result you want is that table you created. that is a little different:
# create a table with days ranging from 1 to the maximum survivor

res = data.table('day' = 1:max(DT$time_alive))

day
1:   1
2:   2
3:   3
4:   4
5:   5
6:   6
7:   7


# use joins
# join by time alive being greater than or equal to the day
# group by the specific day, and count how many observations we have
# allow.cartesian because the mapping isn't one-to-one

res[DT, .(.N) ,on = .(day <= time_alive),by = day, allow.cartesian = T]

day N
1:   1 4
2:   2 3
3:   3 3
4:   4 3
5:   5 3
6:   6 1
7:   7 1