r dataframe data.table data-manipulation summary

Calculate duration of time under threshold in r

I have a dataset with ID, time, value. I want to calculate the total time for the my value stays below 1 in the dataset for each subject.

 library(data.table)     
ID<-rep(1:10,each=10)
    time<-rep(1:10,times=10)
    value<-rep(c(0.001,0.01,0.05,0.07,0.09,0.096,0.1,0.5,1,2),10)
    df<-cbind(ID,time,value)
    df<-as.data.frame(df)
    # # 
    df_sum<-setDT(df)[value < 1, diff(range(time)), by = .(ID)]

In this dataset set the expected answer is 8 hours. I am getting 7 hours. Is this the correct way?

Solution

By using value < 1, you are removing one key row.

df[ID == 1, ]
#        ID  time value     v    t2
#     <num> <num> <num> <num> <num>
#  1:     1     1 0.001 0.001     0
#  2:     1     2 0.010 0.011     1
#  3:     1     3 0.050 0.061     2
#  4:     1     4 0.070 0.131     3
#  5:     1     5 0.090 0.221     4
#  6:     1     6 0.096 0.317     5
#  7:     1     7 0.100 0.417     6
#  8:     1     8 0.500 0.917     7
#  9:     1     9 1.000 1.917     8
# 10:     1    10 2.000 3.917     9
df[ID == 1, ][value < 1, ]
#       ID  time value     v    t2
#    <num> <num> <num> <num> <num>
# 1:     1     1 0.001 0.001     0
# 2:     1     2 0.010 0.011     1
# 3:     1     3 0.050 0.061     2
# 4:     1     4 0.070 0.131     3
# 5:     1     5 0.090 0.221     4
# 6:     1     6 0.096 0.317     5
# 7:     1     7 0.100 0.417     6
# 8:     1     8 0.500 0.917     7

For which the range of time spaces from 0 to 7, which is indeed 7.

I think you need one of two solutions:

diff(range(.))+1, since you want to know 7-0+1=8:

df[ value < 1, diff(range(time)) + 1, by = ID]
#        ID    V1
#     <num> <num>
#  1:     1     8
#  2:     2     8
#  3:     3     8
#  4:     4     8
#  5:     5     8
#  6:     6     8
#  7:     7     8
#  8:     8     8
#  9:     9     8
# 10:    10     8

Include value of 1:

df[ value <= 1, diff(range(time)), by = ID]