Search code examples
rdplyrcountaggregate

How to count the number of observations for a particular month/year in an event-level df?


I am working with an event-level df on protest patterns from 2000-2017. I want to create a new df that tells me the number of individual protest events for each month over this period. I have tried to do this with the following code but have been unsuccessful. There should thus be two columns: one listing the month/year and one listing the number of protests.

dates <- as.Date(dat$Date, '%m/%d/%Y')

yr <- year(dates)

monyr <- as.yearmon(dates)

lst <- lapply(list(dates, yr, monyr), function(x) 
  transform(dat, Count=ave(seq_along(x), x, FUN= length)))

names(lst) <- paste0('newdf', seq_along(lst))

Here is the data:

structure(list(Date = c("16-Mar-07", "19-Mar-07", "20-Mar-07", 
"21-Mar-07", "21-Mar-07", "22-Mar-07"), Region = c("Kemerovo", 
"Moscow City", "Saratov", "Novosibirsk", "Omsk", "Tyva")), row.names = c(NA, 
6L), class = "data.frame")

Thanks for your help.


Solution

  • There are the following problems:

    1. the format string does not correspond to the format of the dates in dat.
    2. if you wish to convert to yearmon it can be done directly. There is no need to convert to Date first.
    3. aggregate, table, xtabs or dplyr::count would be the appropriate function for aggregation
    4. the library statements are missing

    Assuming that what is wanted is a data frame with a yearmon column, yrmon, and a count, Freq or n column we can make these changes resulting in the following code which produces a data frame with one row per year/month. The library statement in the first line of code below is needed for all solutions.

    library(zoo)
    
    # 1 - aggregate
    yrmon <- as.yearmon(dat$Date, '%d-%b-%y')
    aggregate(dat[1], data.frame(yrmon), length)
    ##      yrmon count
    ## 1 Mar 2007     6
    
    # 2 - table
    as.data.frame(table(yrmon))  # yrmon defined above
    ##      yrmon Freq
    ## 1 Mar 2007    6
    
    # 3 - xtabs
    as.data.frame(xtabs(~ yrmon))  # yrmon defined above
    ##      yrmon Freq
    ## 1 Mar 2007    6
    
    # 4 - dplyr
    library(dplyr)
    dat %>%
      mutate(yrmon = as.yearmon(Date, '%d-%b-%y')) %>%
      count(yrmon)
    ##      yrmon n
    ## 1 Mar 2007 6
    

    The first 3 can also be expressed as a pipeline. The last one already is a pipeline.

    # 1a - as a pipeline
    dat |>
      transform(yrmon = as.yearmon(Date, '%d-%b-%y')) |>
      aggregate(Date ~ yrmon, length)
    
    # 2a - table
    dat |>
      transform(yrmon = as.yearmon(Date, '%d-%b-%y')) |>
      with(table(yrmon)) |>
      as.data.frame()
    
    # 3a - xtabs
    dat |>
      transform(yrmon = as.yearmon(Date, '%d-%b-%y')) |>
      xtabs(~ yrmon, data = _) |>
      as.data.frame()
    

    Note

    Lines <- "Date,Region
    16-Mar-07,Kemerovo
    19-Mar-07,Moscow City
    20-Mar-07,Saratov
    21-Mar-07,Novosibirsk
    21-Mar-07,Omsk
    22-Mar-07,Tyva
    16-Mar-08,Kemerovo
    19-Mar-08,Moscow City
    20-Mar-08,Saratov
    21-Mar-08,Novosibirsk
    21-Mar-08,Omsk
    22-Mar-08,Tyva"
    
    dat2 <- read.csv(text = Lines)
    dat2 |>
      transform(yrmon = as.yearmon(Date, '%d-%b-%y')) |>
      with(table(yrmon)) |>
      as.data.frame()
    ##      yrmon Freq
    ## 1 Mar 2007    6
    ## 2 Mar 2008    6