Search code examples
rlistdplyrlubridate

Removing extra elements in a list based on another list


I have a data set that I am try to split into two list. Within each list it contains an element (e.g., [[1]],[[2]],[[3]] in the list object) for a single ID within 10-day intervals (e.g., days 1-10 in [[1]], 11-21 in [[2]], and 22-31 in [[3]]).

In the example code below, the list for jan has a three intervals for each ID (e.g., A has three elements for three intervals, B has three elements for three intervals, and C has three elements for three intervals). The list for july only has 2 intervals for each ID, which is a problem for me (e.g., it only contains [[1]] and [[2]] in the list object instead of three).

I am trying to figure out how I could remove the extra interval in jan that do not correspond with the intervals with july. For example, for ID A I would like to create a function to compare the two list, and remove the third interval (missing interval from july) in jan. How can I go about doing this?

library(lubridate)
library(tidyverse)
date <- rep_len(seq(dmy("01-01-2010"), dmy("20-07-2010"), by = "days"), 600)
ID <- rep(c("A","B","C"), 200)

df <- data.frame(date = date,
                 x = runif(length(date), min = 60000, max = 80000),
                 y = runif(length(date), min = 800000, max = 900000),
                 ID)

df$month <- month(df$date)

jan <- df %>%
  mutate(new = floor_date(date, "10 days")) %>%
  group_by(ID) %>% 
  mutate(new = if_else(day(new) == 31, new - days(10), new)) %>% 
  group_by(new, .add = TRUE) %>%
  filter(month == "1") %>% 
  group_split()

july <- df %>%
  mutate(new = floor_date(date, "10 days")) %>%
  group_by(ID) %>% 
  mutate(new = if_else(day(new) == 31, new - days(10), new)) %>% 
  group_by(new, .add = TRUE) %>%
  filter(month == "7") %>% 
  group_split()


Solution

  • I am still not sure what you are actually after. Anyway, this code does what you asked for.

    df2 <- bind_rows(jan, july) %>%
      # adding a helper variable to distinguish if a day from the date component is
      # 10 or lower, 20 or lower or the rest 
      mutate(helper = ceiling(day(date)/10) %>% pmin(3)) %>% 
      group_by(ID, helper) %>%
      # adding another helper finding out how may distinct months there are in the subgroup
      mutate(helper2 = n_distinct(month)) %>% ungroup() %>%
      filter(helper2 == 2) %>%
      # getting rid of the helpers
      select(-helper, -helper2) %>%
      group_by(ID, new)
    
    jan2 <- df2 %>%
      filter(month == "1") %>% 
      group_split()