Search code examples
rvectordata.tablecomparisoncounting

Comparing a vector against each element of another vector


I'm trying to track accumulation of events over time, e.g. the graphs of total number of COVID Cases & deaths over the past year. My starting data is a list of individuals (rows) with the date for each event in the column. A simplified example would be:

library(data.table)
#   Set up 20 subjects and # of days at which each of 3 events happen
(events<-data.table(Subject=1:20, Event1=100*runif(20), Event2=200*runif(20), Event3=500*runif(20)))
(accrual<-data.table(days=10*1:10))  
# Col. 1 has timepoints at which I want to count events occurring by that date

My quick way to count is to compare the whole list of dates for an event (a column)to a single date, e.g. for day 50:

> events[Event1 < 70, length(Subject)]
[1] 12

I've been trying to compare each of 3 columns iteratively against each of single dates in my list to build a table I can use to graph accruals (see end of question for example). Any time I try to do this as a vector operation (data.table, apply functions), the result is only one count, not a vector of counts for each date

> events[Event1 < accrual$days, length(Subject)]
[1] 11
> events[Event1 < accrual[,days], length(Subject)]
[1] 11
> sum(events$Event1 < accrual$days[1:10])
[1] 11

This seems to compare the vectors of events and dates pairwise, which is the advertised behavior. What I really want is for the whole column to be evaluated against the first element of dates, then the 2nd element of dates, etc. Having used data.table and dpylr for years, I think there should be a more elegant way to do this than looping and counting as I go. The following code works, but I feel I'm missing a simpler, more elegant solution.

> # Ugly, manual way to count events for each date.
> t2<-NULL
> for(i in accrual$days) {
+   t1<-sum( events[, Event1] < i )
+   t2<-c(t2, t1)
+ }
> accrual[,Events1:=t2]
> t2<-NULL
> for(i in accrual$days) {
+   t1<-sum( events[, Event2] < i )
+   t2<-c(t2, t1)
+ }
> accrual[,Events2:=t2]
> t2<-NULL
> for(i in accrual$days) {
+   t1<-sum( events[, Event3] < i )
+   t2<-c(t2, t1)
+ }
> accrual[,Events3:=t2]
> accrual
    days Events1 Events2 Events3
 1:   10       2       1       0
 2:   20       7       2       0
 3:   30       9       2       0
 4:   40      10       4       0
 5:   50      11       5       1
 6:   60      11       6       1
 7:   70      12       6       1
 8:   80      16       6       1
 9:   90      18       8       3
10:  100      20       8       3

Thank you for your suggestions.


Solution

  • Here is one data.table option that may help

    > accrual[, as.list(colSums(events[, -c("Subject")] <= days)), days]
        days Event1 Event2 Event3
     1:   10      4      2      0
     2:   20      6      3      0
     3:   30     10      5      1
     4:   40     12      7      3
     5:   50     13      7      3
     6:   60     15      8      4
     7:   70     16      8      4
     8:   80     19      9      4
     9:   90     20     11      4
    10:  100     20     13      4