Search code examples
rdplyr

Sorting dates into week long groups, based off another variable


I'm trying to group together events that are happening within a week of each other in the same county

#excerpt from my data

data <- data.frame(
  Region = c("PI", "PI", "PI", "PI", "NE", "SE", "SE", "NE", "NW"),
  County = c("Maui", "Maui", "Maui", "Maui", "Northampton", "Arecibo", "Carteret", "None", "Tillamook"),
  Observation.Date = as.Date(c("2005-02-28", "2005-02-28", "2005-03-01", "2005-03-03", "2005-03-03", "2005-03-17", "2005-03-25", "2005-03-26", "2005-04-02"))
)

With the above data, the first 4 are all in Maui, and happening within a week of each other, so I want to group them together. I want to be able to pull that group of data out, with information on how many values are within it, so that it would look something like this


  region  names      start_date  end_date     
 1 A      N          2024-01-16  2024-01-16
 2 A      W, V       2024-03-26  2024-03-30
 3 A      N          2024-04-30  2024-04-30
 4 B      P          2024-05-15  2024-05-15
 5 B      K, N       2024-10-17  2024-10-20

This is the function that worked for that dataset, but I am getting an error when I use it for my main dataset

data <- data %>% arrange(data$Region, data$Observation.Date)

group_within_week <- function(data) {
  data %>%
    group_by(data$Region) %>%
    arrange(data$Observation.Date) %>%
    mutate(week_group = cut(data$Observation.Date, breaks = "week", labels = FALSE)) %>%
    group_by(data$Region, week_group) %>%
    summarise(names_list = toString(data$County), 
              start_date = min(data$Observation.Date), 
              end_date = max(data$Observation.Date)) %>%
    ungroup() %>%
    select(-week_group)
}
grouped_data <- group_within_week(data)

print(grouped_data)

Solution

  • Try this:

    library(dplyr)
    data <- data.frame(
      Region = c("PI", "PI", "PI", "PI", "NE", "SE", "SE", "NE", "NW"),
      County = c("Maui", "Maui", "Maui", "Maui", "Northampton", "Arecibo", "Carteret", "None", "Tillamook"),
      Observation.Date = as.Date(c("2005-02-28", "2005-02-28", "2005-03-01", "2005-03-03", "2005-03-03", "2005-03-17", "2005-03-25", "2005-03-26", "2005-04-02"))
    )
    
    data <- data %>% arrange(data$Region, data$Observation.Date)
    
    group_within_week <- function(data) {
      data %>%
        group_by(Region) %>%
        arrange(Observation.Date) %>%
        mutate(week_group = cut(Observation.Date, breaks = "week", labels = FALSE)) %>%
        ungroup%>%
        group_by(Region, week_group) %>%
        summarise(names_list = toString(County), 
                  start_date = min(Observation.Date), 
                  end_date = max(Observation.Date)) %>%
        ungroup() %>%
        select(-week_group)
    }
    grouped_data <- group_within_week(data)
    
    print(grouped_data)
    
    # A tibble: 6 x 4
      Region names_list             start_date end_date  
      <chr>  <chr>                  <date>     <date>    
    1 NE     Northampton            2005-03-03 2005-03-03
    2 NE     None                   2005-03-26 2005-03-26
    3 NW     Tillamook              2005-04-02 2005-04-02
    4 PI     Maui, Maui, Maui, Maui 2005-02-28 2005-03-03
    5 SE     Arecibo                2005-03-17 2005-03-17
    6 SE     Carteret               2005-03-25 2005-03-25
    

    If you do not need that the county name repeats several times, wrap it into unique function.