Search code examples
rcountaggregate

Aggregate per day counting different IDs in R


I want to get the count of cases per day, including zeros. This is my data frame example:

set.seed(1453); ID = sample(1:4, 10, TRUE)
date = c('2016-01-01', '2016-01-05', '2016-01-07',  '2016-01-12',  '2016-01-16',  '2016-01-20',
         '2016-01-20',  '2016-01-25',  '2016-01-26',  '2016-01-31')
df = data.frame(ID, date = as.Date(date))

So I know that there was one case for ID 1 on 2016-01-01, then one case for ID 1 on 2016-01-20. So I want to get a data frame from 2016-01-01 to 2016-01-31 with 1 on those two days and 0 otherwise. I will like the same for each ID. So this example shows one event per ID, but I have up to 15 cases per day per ID in my actual data frame.

I have used:

M <- function(timeStamps) {
  Dates <- as.Date(strftime(df$date, "%Y-%m-%d"))
  allDates <- seq(from = min(Dates), to = max(Dates), by = "day")
  Admission <- sapply(allDates, FUN = function(X) sum(Dates == X))
  data.frame(day = allDates, Admission = Admission)
}
MM<-M(df$date)

But MM will only give me the result I want if I create a data frame for each ID.

I have done the same exercise using this example, but I get monthly aggregate results here. Ideally, I would be able to aggregate a similar data frame per day, considering 0 events per ID.


Solution

  • With the current function, we can split the 'date' by 'ID' column, apply the function, and rbind the list output to a single data.frame with ID as another column

    lst1 <- lapply(split(df$date, df$ID), M)
    out <- do.call(rbind, Map(cbind, ID = names(lst1), lst1))
    row.names(out) <- NULL
    

    -output

    > str(out)
    'data.frame':   124 obs. of  3 variables:
     $ ID       : chr  "1" "1" "1" "1" ...
     $ day      : Date, format: "2016-01-01" "2016-01-02" "2016-01-03" "2016-01-04" ...
     $ Admission: int  1 0 0 0 1 0 1 0 0 0 ...
    > head(out)
      ID        day Admission
    1  1 2016-01-01         1
    2  1 2016-01-02         0
    3  1 2016-01-03         0
    4  1 2016-01-04         0
    5  1 2016-01-05         1
    6  1 2016-01-06         0
    

    Or using tidyverse, do a group by operation

    library(dplyr)
    library(tidyr)
    df %>%
      group_by(ID) %>% 
      summarise(out = M(date), .groups = 'drop') %>%
      unpack(out)
    

    -output

    # A tibble: 124 × 3
          ID day        Admission
       <int> <date>         <int>
     1     1 2016-01-01         1
     2     1 2016-01-02         0
     3     1 2016-01-03         0
     4     1 2016-01-04         0
     5     1 2016-01-05         1
     6     1 2016-01-06         0
     7     1 2016-01-07         1
     8     1 2016-01-08         0
     9     1 2016-01-09         0
    10     1 2016-01-10         0
    # … with 114 more rows