Search code examples
rdatetime-seriestime-frequency

How to obtain daily time series of categorical frequencies in R


I have a data frame as such:

data <- data.frame(daytime = c('2005-05-03 11:45:23', '2005-05-03 11:47:45', 
                           '2005-05-03 12:00:32', '2005-05-03 12:25:01',
                           '2006-05-02 10:45:15', '2006-05-02 11:15:14',
                           '2006-05-02 11:16:15', '2006-05-02 11:18:03'),
               category = c("A", "A", "A", "B", "B", "B", "B", "A"))
print(data)

              daytime category    date2
1 2005-05-03 11:45:23        A 05/03/05
2 2005-05-03 11:47:45        A 05/03/05
3 2005-05-03 12:00:32        A 05/03/05
4 2005-05-03 12:25:01        B 05/03/05
5 2006-05-02 10:45:15        B 05/02/06
6 2006-05-02 11:15:14        B 05/02/06
7 2006-05-02 11:16:15        B 05/02/06
8 2006-05-02 11:18:03        A 05/02/06

I would like to turn this data frame into a time series of daily categorical frequencies like this:

         day cat_A_freq cat_B_freq
1 2005-05-01          3          1
2 2005-05-02          1          3

I have tried doing:

library(anytime)
data$daytime <- anytime(data$daytime)

data$day <- factor(format(data$daytime, "%D"))
table(data$day, data$category)

           A B
  05/02/06 1 3
  05/03/05 3 1

But as you can see the formatting a new variable, day, changes the appearance of the date. You can also see that the table does not return the days in proper order (the years are out of order) so that I can then convert to a time series, easily.

Any ideas on how to get frequencies in an easier way, or if this is the way, how to get the frequencies in correct date order and into a dataframe for easy conversion to a time series object?


Solution

  • A solution using . The format of your daytime column in your data is good, so we can use as.Date directly without specifying other formats or using other functions.

    library(tidyverse)
    data2 <- data %>%
      mutate(day = as.Date(daytime)) %>%
      count(day, category) %>%
      spread(category, n)
    data2
    # # A tibble: 2 x 3
    #          day     A     B
    # *     <date> <int> <int>
    # 1 2005-05-03     3     1
    # 2 2006-05-02     1     3