I'm trying to group together events that are happening within a week of each other in the same county
#excerpt from my data
data <- data.frame(
Region = c("PI", "PI", "PI", "PI", "NE", "SE", "SE", "NE", "NW"),
County = c("Maui", "Maui", "Maui", "Maui", "Northampton", "Arecibo", "Carteret", "None", "Tillamook"),
Observation.Date = as.Date(c("2005-02-28", "2005-02-28", "2005-03-01", "2005-03-03", "2005-03-03", "2005-03-17", "2005-03-25", "2005-03-26", "2005-04-02"))
)
With the above data, the first 4 are all in Maui, and happening within a week of each other, so I want to group them together. I want to be able to pull that group of data out, with information on how many values are within it, so that it would look something like this
region names start_date end_date
1 A N 2024-01-16 2024-01-16
2 A W, V 2024-03-26 2024-03-30
3 A N 2024-04-30 2024-04-30
4 B P 2024-05-15 2024-05-15
5 B K, N 2024-10-17 2024-10-20
This is the function that worked for that dataset, but I am getting an error when I use it for my main dataset
data <- data %>% arrange(data$Region, data$Observation.Date)
group_within_week <- function(data) {
data %>%
group_by(data$Region) %>%
arrange(data$Observation.Date) %>%
mutate(week_group = cut(data$Observation.Date, breaks = "week", labels = FALSE)) %>%
group_by(data$Region, week_group) %>%
summarise(names_list = toString(data$County),
start_date = min(data$Observation.Date),
end_date = max(data$Observation.Date)) %>%
ungroup() %>%
select(-week_group)
}
grouped_data <- group_within_week(data)
print(grouped_data)
Try this:
library(dplyr)
data <- data.frame(
Region = c("PI", "PI", "PI", "PI", "NE", "SE", "SE", "NE", "NW"),
County = c("Maui", "Maui", "Maui", "Maui", "Northampton", "Arecibo", "Carteret", "None", "Tillamook"),
Observation.Date = as.Date(c("2005-02-28", "2005-02-28", "2005-03-01", "2005-03-03", "2005-03-03", "2005-03-17", "2005-03-25", "2005-03-26", "2005-04-02"))
)
data <- data %>% arrange(data$Region, data$Observation.Date)
group_within_week <- function(data) {
data %>%
group_by(Region) %>%
arrange(Observation.Date) %>%
mutate(week_group = cut(Observation.Date, breaks = "week", labels = FALSE)) %>%
ungroup%>%
group_by(Region, week_group) %>%
summarise(names_list = toString(County),
start_date = min(Observation.Date),
end_date = max(Observation.Date)) %>%
ungroup() %>%
select(-week_group)
}
grouped_data <- group_within_week(data)
print(grouped_data)
# A tibble: 6 x 4
Region names_list start_date end_date
<chr> <chr> <date> <date>
1 NE Northampton 2005-03-03 2005-03-03
2 NE None 2005-03-26 2005-03-26
3 NW Tillamook 2005-04-02 2005-04-02
4 PI Maui, Maui, Maui, Maui 2005-02-28 2005-03-03
5 SE Arecibo 2005-03-17 2005-03-17
6 SE Carteret 2005-03-25 2005-03-25
If you do not need that the county name repeats several times, wrap it into unique
function.