I have a data set that has 5 ID
s and span from 01-01-2010
to 12-31-2013
. I first split
the data by ID
, and end up with a list object. I then create another list that creates 10-day intervals and is arranged by ID
.
I would like to nest these intervals into the first list of ID
s based on the ID
s labelled within the interval elements.
For example:
The main list consist of the ID
s as the elements. The [1]
,[2]
,[3]
are the intervals for the ID
that is nested in. For example, in [A]
the intervals are all for the ID
A, for [B]
it is for B, for [C]
it is for C, etc..
[A]
[1]
[2]
[3]
[B]
[1]
[2]
[3]
[C]
[1]
[2]
[3]
[D]
[1]
[2]
[3]
[E]
[1]
[2]
[3]
The code below nests the intervals into the ID
list, but it nest all of the ID
s and not the specific ones that it should be in.
set.seed(12345)
library(lubridate)
library(tidyverse)
date <- rep_len(seq(dmy("01-01-2010"), dmy("31-12-2013"), by = "days"), 500)
ID <- rep(c("A","B","C","D", "E"), 100)
df <- data.frame(date = date,
x = runif(length(date), min = 60000, max = 80000),
y = runif(length(date), min = 800000, max = 900000),
ID)
df_ID <- split(df, df$ID)
df_nested <- lapply(df_ID, function(x){
x %>%
arrange(ID) %>%
# Creates a new column assigning the first day in the 10-day interval in which
# the date falls under (e.g., 01-01-2010 would be in the first 10-day interval
# so the `floor_date` assigned to it would be 01-01-2010)
mutate(new = floor_date(date, "10 days")) %>%
# For any months that has 31 days, the 31st day would normally be assigned its
# own interval. The code below takes the 31st day and joins it with the
# previous interval.
mutate(new = if_else(day(new) == 31, new - days(10), new)) %>%
group_by(new, .add = TRUE) %>%
group_split()
})
I would do it like this:
set.seed(12345)
library(lubridate)
library(tidyverse)
f = function(data){
data %>% mutate(
new = floor_date(data$date, "10 days"),
new = if_else(day(new) == 31, new - days(10), new)
)
}
tibble(
ID = rep(c("A","B","C","D", "E"), 100),
date = rep_len(seq(dmy("01-01-2010"), dmy("31-12-2013"), by = "days"), 500),
x = runif(length(date), min = 60000, max = 80000),
y = runif(length(date), min = 800000, max = 900000)
) %>% group_by(ID) %>%
nest() %>%
mutate(data = map(data, f)) %>%
unnest(data)
output
# A tibble: 500 x 5
# Groups: ID [5]
ID date x y new
<chr> <date> <dbl> <dbl> <date>
1 A 2010-01-01 74418. 820935. 2010-01-01
2 A 2010-01-06 63327. 885896. 2010-01-01
3 A 2010-01-11 60691. 873949. 2010-01-11
4 A 2010-01-16 69250. 868411. 2010-01-11
5 A 2010-01-21 69075. 876142. 2010-01-21
6 A 2010-01-26 67797. 829892. 2010-01-21
7 A 2010-01-31 75860. 843542. 2010-01-21
8 A 2010-02-05 67233. 882318. 2010-02-01
9 A 2010-02-10 75644. 826283. 2010-02-01
10 A 2010-02-15 66424. 853789. 2010-02-11
Simple and clear, isn't it?
Everything you want to do with data is included in the f
function. You can extend it as needed.
The rest is done in a simple scheme
tibble %>% group_by %>% nest % mutate %>% unnest