Search code examples
rdataframedata.tabledata-manipulationsummary

Calculate duration of time under threshold in r


I have a dataset with ID, time, value. I want to calculate the total time for the my value stays below 1 in the dataset for each subject.

 library(data.table)     
ID<-rep(1:10,each=10)
    time<-rep(1:10,times=10)
    value<-rep(c(0.001,0.01,0.05,0.07,0.09,0.096,0.1,0.5,1,2),10)
    df<-cbind(ID,time,value)
    df<-as.data.frame(df)
    # # 
    df_sum<-setDT(df)[value < 1, diff(range(time)), by = .(ID)]

In this dataset set the expected answer is 8 hours. I am getting 7 hours. Is this the correct way?


Solution

  • By using value < 1, you are removing one key row.

    df[ID == 1, ]
    #        ID  time value     v    t2
    #     <num> <num> <num> <num> <num>
    #  1:     1     1 0.001 0.001     0
    #  2:     1     2 0.010 0.011     1
    #  3:     1     3 0.050 0.061     2
    #  4:     1     4 0.070 0.131     3
    #  5:     1     5 0.090 0.221     4
    #  6:     1     6 0.096 0.317     5
    #  7:     1     7 0.100 0.417     6
    #  8:     1     8 0.500 0.917     7
    #  9:     1     9 1.000 1.917     8
    # 10:     1    10 2.000 3.917     9
    df[ID == 1, ][value < 1, ]
    #       ID  time value     v    t2
    #    <num> <num> <num> <num> <num>
    # 1:     1     1 0.001 0.001     0
    # 2:     1     2 0.010 0.011     1
    # 3:     1     3 0.050 0.061     2
    # 4:     1     4 0.070 0.131     3
    # 5:     1     5 0.090 0.221     4
    # 6:     1     6 0.096 0.317     5
    # 7:     1     7 0.100 0.417     6
    # 8:     1     8 0.500 0.917     7
    

    For which the range of time spaces from 0 to 7, which is indeed 7.

    I think you need one of two solutions:

    1. diff(range(.))+1, since you want to know 7-0+1=8:

      df[ value < 1, diff(range(time)) + 1, by = ID]
      #        ID    V1
      #     <num> <num>
      #  1:     1     8
      #  2:     2     8
      #  3:     3     8
      #  4:     4     8
      #  5:     5     8
      #  6:     6     8
      #  7:     7     8
      #  8:     8     8
      #  9:     9     8
      # 10:    10     8
      
    2. Include value of 1:

      df[ value <= 1, diff(range(time)), by = ID]