Search code examples
rdplyrlubridate

Nesting Specific Elements into another list in r


I have a data set that has 5 IDs and span from 01-01-2010 to 12-31-2013. I first split the data by ID, and end up with a list object. I then create another list that creates 10-day intervals and is arranged by ID.

I would like to nest these intervals into the first list of IDs based on the IDs labelled within the interval elements.

For example: The main list consist of the IDs as the elements. The [1],[2],[3] are the intervals for the ID that is nested in. For example, in [A] the intervals are all for the ID A, for [B] it is for B, for [C] it is for C, etc..

[A]
   [1]
   [2]
   [3]
[B]
   [1]
   [2]
   [3]
[C]
   [1]
   [2]
   [3]
[D]
   [1]
   [2]
   [3]
[E]
   [1]
   [2]
   [3]

The code below nests the intervals into the ID list, but it nest all of the IDs and not the specific ones that it should be in.

set.seed(12345)
library(lubridate)
library(tidyverse)

date <- rep_len(seq(dmy("01-01-2010"), dmy("31-12-2013"), by = "days"), 500)
ID <- rep(c("A","B","C","D", "E"), 100)

df <- data.frame(date = date,
                 x = runif(length(date), min = 60000, max = 80000),
                 y = runif(length(date), min = 800000, max = 900000),
                 ID)

df_ID <- split(df, df$ID)


df_nested <- lapply(df_ID, function(x){
  x %>%
    arrange(ID) %>% 
    # Creates a new column assigning the first day in the 10-day interval in which
    # the date falls under (e.g., 01-01-2010 would be in the first 10-day interval
    # so the `floor_date` assigned to it would be 01-01-2010)
    mutate(new = floor_date(date, "10 days")) %>%
    # For any months that has 31 days, the 31st day would normally be assigned its 
    # own interval. The code below takes the 31st day and joins it with the 
    # previous interval. 
    mutate(new = if_else(day(new) == 31, new - days(10), new)) %>% 
    group_by(new, .add = TRUE) %>%
    group_split()
})

Solution

  • I would do it like this:

    set.seed(12345)
    library(lubridate)
    library(tidyverse)
    
    f = function(data){
      data %>% mutate(
        new = floor_date(data$date, "10 days"),
        new = if_else(day(new) == 31, new - days(10), new)
      )
    }
    
    tibble(
      ID = rep(c("A","B","C","D", "E"), 100),
      date = rep_len(seq(dmy("01-01-2010"), dmy("31-12-2013"), by = "days"), 500),
      x = runif(length(date), min = 60000, max = 80000),
      y = runif(length(date), min = 800000, max = 900000)
    ) %>% group_by(ID) %>% 
      nest() %>% 
      mutate(data = map(data, f)) %>% 
      unnest(data)
    

    output

    # A tibble: 500 x 5
    # Groups:   ID [5]
       ID    date            x       y new       
       <chr> <date>      <dbl>   <dbl> <date>    
     1 A     2010-01-01 74418. 820935. 2010-01-01
     2 A     2010-01-06 63327. 885896. 2010-01-01
     3 A     2010-01-11 60691. 873949. 2010-01-11
     4 A     2010-01-16 69250. 868411. 2010-01-11
     5 A     2010-01-21 69075. 876142. 2010-01-21
     6 A     2010-01-26 67797. 829892. 2010-01-21
     7 A     2010-01-31 75860. 843542. 2010-01-21
     8 A     2010-02-05 67233. 882318. 2010-02-01
     9 A     2010-02-10 75644. 826283. 2010-02-01
    10 A     2010-02-15 66424. 853789. 2010-02-11
    

    Simple and clear, isn't it?

    Everything you want to do with data is included in the f function. You can extend it as needed.

    The rest is done in a simple scheme tibble %>% group_by %>% nest % mutate %>% unnest