I have scattered data sets with different observations from different years in a list I want to group observations from the same years in the same datasets.
mydata1<-data.frame(ID=1:5, year=c(2011:2015),Weight=c(50:54))
mydata2<-data.frame(ID=6:10, year=c(2011:2013,2011,2014),Weight=c(52:56))
mydata3<-data.frame(ID=11:15, year=c(2011:2013,2011,2015),Weight=c(61:65))
mydata4<-data.frame(ID=16:20, year=c(2012:2014,2015,2013),Weight=c(70:74))
mydatalist<-list(mydata1=mydata1,
mydata2=mydata2,
mydata3=mydata3,
mydata4=mydata4)
Normally, the code below allows me to do this, but that's not what I want. If I understand this method correctly, all the datasets are bound first, then split by year.
library(dplyr)
bind_rows(mydatalist) %>%
split(f = as.factor(.$year))
But I don't have enough space in my computer memory to join all my actual datasets
What I want to do is first split all the datasets by year, then join them back together.
I want something that does this internally:
$[[1]]
ID year Weight
1 1 2011 50
$[[2]]
ID year Weight
1 2 2012 51
$[[3]]
ID year Weight
1 3 2013 52
$[[4]]
ID year Weight
1 4 2014 53
$[[5]]
ID year Weight
5 5 2015 54
$[[6]]
ID year Weight
1 6 2011 52
2 9 2011 55
$[[7]]
ID year Weight
1 7 2012 53
$[[8]]
ID year Weight
1 8 2013 54
$[[9]]
ID year Weight
1 10 2014 56
......... etc
before getting this format:
$`2011`
ID year Weight
1 1 2011 50
6 6 2011 52
9 9 2011 55
11 11 2011 61
14 14 2011 64
$`2012`
ID year Weight
2 2 2012 51
7 7 2012 53
12 12 2012 62
16 16 2012 70
$`2013`
ID year Weight
3 3 2013 52
8 8 2013 54
13 13 2013 63
17 17 2013 71
20 20 2013 74
$`2014`
ID year Weight
4 4 2014 53
10 10 2014 56
18 18 2014 72
$`2015`
ID year Weight
5 5 2015 54
15 15 2015 65
19 19 2015 73
Here's an option using purrr
:
If you want to do this in intermediate steps, you can just assign parts of the chained (%>%)
operations to a variable, e.g.:
step1 <- mydatalist %>%
map(~split(.x, .x$year))
step2 <- step1 %>%
list_flatten(name_spec = "{inner}") %>%
list_rbind() %>%
split(.$year)
library(tidyverse)
library(purrr)
mydata1<-data.frame(ID=1:5, year=c(2011:2015),Weight=c(50:54))
mydata2<-data.frame(ID=6:10, year=c(2011:2013,2011,2014),Weight=c(52:56))
mydata3<-data.frame(ID=11:15, year=c(2011:2013,2011,2015),Weight=c(61:65))
mydata4<-data.frame(ID=16:20, year=c(2012:2014,2015,2013),Weight=c(70:74))
mydatalist<-list(mydata1=mydata1,
mydata2=mydata2,
mydata3=mydata3,
mydata4=mydata4)
mydatalist %>%
map(~split(.x, .x$year)) %>%
list_flatten(name_spec = "{inner}") %>%
list_rbind() %>%
split(.$year)
#> $`2011`
#> ID year Weight
#> 1 1 2011 50
#> 6 6 2011 52
#> 7 9 2011 55
#> 11 11 2011 61
#> 12 14 2011 64
#>
#> $`2012`
#> ID year Weight
#> 2 2 2012 51
#> 8 7 2012 53
#> 13 12 2012 62
#> 16 16 2012 70
#>
#> $`2013`
#> ID year Weight
#> 3 3 2013 52
#> 9 8 2013 54
#> 14 13 2013 63
#> 17 17 2013 71
#> 18 20 2013 74
#>
#> $`2014`
#> ID year Weight
#> 4 4 2014 53
#> 10 10 2014 56
#> 19 18 2014 72
#>
#> $`2015`
#> ID year Weight
#> 5 5 2015 54
#> 15 15 2015 65
#> 20 19 2015 73
Created on 2023-04-15 by the reprex package (v2.0.1)