I have a data set that I am try to split into two list. Within each list it contains an element (e.g., [[1]]
,[[2]]
,[[3]]
in the list object) for a single ID
within 10-day intervals (e.g., days 1-10 in [[1]]
, 11-21 in [[2]]
, and 22-31 in [[3]]
).
In the example code below, the list for jan
has a three intervals for each ID
(e.g., A
has three elements for three intervals, B
has three elements for three intervals, and C
has three elements for three intervals). The list for july
only has 2 intervals for each ID
, which is a problem for me (e.g., it only contains [[1]]
and [[2]]
in the list object instead of three).
I am trying to figure out how I could remove the extra interval in jan
that do not correspond with the intervals with july
. For example, for ID
A
I would like to create a function to compare the two list, and remove the third interval (missing interval from july
) in jan
. How can I go about doing this?
library(lubridate)
library(tidyverse)
date <- rep_len(seq(dmy("01-01-2010"), dmy("20-07-2010"), by = "days"), 600)
ID <- rep(c("A","B","C"), 200)
df <- data.frame(date = date,
x = runif(length(date), min = 60000, max = 80000),
y = runif(length(date), min = 800000, max = 900000),
ID)
df$month <- month(df$date)
jan <- df %>%
mutate(new = floor_date(date, "10 days")) %>%
group_by(ID) %>%
mutate(new = if_else(day(new) == 31, new - days(10), new)) %>%
group_by(new, .add = TRUE) %>%
filter(month == "1") %>%
group_split()
july <- df %>%
mutate(new = floor_date(date, "10 days")) %>%
group_by(ID) %>%
mutate(new = if_else(day(new) == 31, new - days(10), new)) %>%
group_by(new, .add = TRUE) %>%
filter(month == "7") %>%
group_split()
I am still not sure what you are actually after. Anyway, this code does what you asked for.
df2 <- bind_rows(jan, july) %>%
# adding a helper variable to distinguish if a day from the date component is
# 10 or lower, 20 or lower or the rest
mutate(helper = ceiling(day(date)/10) %>% pmin(3)) %>%
group_by(ID, helper) %>%
# adding another helper finding out how may distinct months there are in the subgroup
mutate(helper2 = n_distinct(month)) %>% ungroup() %>%
filter(helper2 == 2) %>%
# getting rid of the helpers
select(-helper, -helper2) %>%
group_by(ID, new)
jan2 <- df2 %>%
filter(month == "1") %>%
group_split()