I have a dataframe which consists of the times when patients died.
It looks something like this
Time Alive Died Lost
0 375 0 2
0.0668 373 1 9
0.3265 363 2 12
0.6439 349 0 6
0.7978 343 2 1
0.8363 340 2 2
0.8844 336 2 0
0.894 334 3 2
0.9325 329 4 0
0.9517 325 4 1
I want to make a function where it will check if the time between two rows is less than a threshold.
If say t2 - t1 < threshold then it would log how many people died in that interval and how many were lost in that interval and log that. It would then give out a dataframe with intervals larger than the threshold with the corresponding numbers added.
Say if my threshold was 0.29 The second row would be removed logging that 1 person died and 9 were lost and would add this to the first' row Died/Lost columns
looking something like
Time Alive Died Lost
0 375 1 11
0.3265 363 2 12
0.6439 349 0 6
...
I've written something but it fails if it has to add multiple rows. Whats the best way to do this efficiently?
EDIT
aggregateTimes <- function(data, threshold = 0.04){
indices <- (diff(data[,1]) < threshold)
indices <- c(FALSE, indices)
for(i in 1:(nrow(data)-1)){
row1 <- data[i, ]
row2 <- data[i+1, ]
if((row2[,1] - row1[,1]) < threshold){
newrow <- row1 + c(0,0, row2[, 3:4])
data[i,] <- newrow
data <- data[-(i+1),]
}
}
return(data)
}
But the indexing fails because data is of reduced dimension?
To answer @Moody_Mudskipper
Time Alive Died Lost
0 375 1 11
0.3265 363 2 12
0.6439 349 13 11
0.9517 325 4 1
Do not know if this is exactly what you want, but this will group all the entries in 0.29 time intervals:
require(data.table)
setDT(d)
d[, tt := floor(Time/0.29)]
d[, `:=`(newTime = first(Time), Alive = first(Alive)), keyby = tt]
d[, lapply(.SD, sum), by = .(newTime, Alive), .SDcols = c('Died', 'Lost')]
# newTime Alive Died Lost
# 1: 0.0000 375 1 11
# 2: 0.3265 363 2 12
# 3: 0.6439 349 4 9
# 4: 0.8844 336 13 3
Or this is more precise:
# create newTime indikator
newTimes <- d$Time
while(any(diff(newTimes) < 0.29)){
i <- diff(newTimes) < 0.29
i <- which(i)[1] + 1L
newTimes <- newTimes[-i]
}
newTimes
# [1] 0.0000 0.3265 0.6439 0.9517
d[, tt := cumsum(Time %in% newTimes)] #grouping id
# adds new columns by grouping id (tt):
d[, `:=`(newTime = first(Time), Alive = first(Alive)), keyby = tt]
# sums Died and Lost by groups:
d[, lapply(.SD, sum), by = .(newTime, Alive), .SDcols = c('Died', 'Lost')]
# newTime Alive Died Lost
# 1: 0.0000 375 1 11
# 2: 0.3265 363 2 12
# 3: 0.6439 349 13 11
# 4: 0.9517 325 4 1