I have a dataset with ID, time, value. I want to calculate the total time for the my value stays below 1 in the dataset for each subject.
library(data.table)
ID<-rep(1:10,each=10)
time<-rep(1:10,times=10)
value<-rep(c(0.001,0.01,0.05,0.07,0.09,0.096,0.1,0.5,1,2),10)
df<-cbind(ID,time,value)
df<-as.data.frame(df)
# #
df_sum<-setDT(df)[value < 1, diff(range(time)), by = .(ID)]
In this dataset set the expected answer is 8 hours. I am getting 7 hours. Is this the correct way?
By using value < 1
, you are removing one key row.
df[ID == 1, ]
# ID time value v t2
# <num> <num> <num> <num> <num>
# 1: 1 1 0.001 0.001 0
# 2: 1 2 0.010 0.011 1
# 3: 1 3 0.050 0.061 2
# 4: 1 4 0.070 0.131 3
# 5: 1 5 0.090 0.221 4
# 6: 1 6 0.096 0.317 5
# 7: 1 7 0.100 0.417 6
# 8: 1 8 0.500 0.917 7
# 9: 1 9 1.000 1.917 8
# 10: 1 10 2.000 3.917 9
df[ID == 1, ][value < 1, ]
# ID time value v t2
# <num> <num> <num> <num> <num>
# 1: 1 1 0.001 0.001 0
# 2: 1 2 0.010 0.011 1
# 3: 1 3 0.050 0.061 2
# 4: 1 4 0.070 0.131 3
# 5: 1 5 0.090 0.221 4
# 6: 1 6 0.096 0.317 5
# 7: 1 7 0.100 0.417 6
# 8: 1 8 0.500 0.917 7
For which the range of time
spaces from 0
to 7
, which is indeed 7
.
I think you need one of two solutions:
diff(range(.))+1
, since you want to know 7-0+1=8
:
df[ value < 1, diff(range(time)) + 1, by = ID]
# ID V1
# <num> <num>
# 1: 1 8
# 2: 2 8
# 3: 3 8
# 4: 4 8
# 5: 5 8
# 6: 6 8
# 7: 7 8
# 8: 8 8
# 9: 9 8
# 10: 10 8
Include value
of 1:
df[ value <= 1, diff(range(time)), by = ID]