How to split dataframes of a list then join them based on a variable

I have scattered data sets with different observations from different years in a list I want to group observations from the same years in the same datasets.

mydata1<-data.frame(ID=1:5, year=c(2011:2015),Weight=c(50:54))
mydata2<-data.frame(ID=6:10, year=c(2011:2013,2011,2014),Weight=c(52:56))
mydata3<-data.frame(ID=11:15, year=c(2011:2013,2011,2015),Weight=c(61:65))
mydata4<-data.frame(ID=16:20, year=c(2012:2014,2015,2013),Weight=c(70:74))

mydatalist<-list(mydata1=mydata1,
     mydata2=mydata2,
     mydata3=mydata3,
     mydata4=mydata4)

Normally, the code below allows me to do this, but that's not what I want. If I understand this method correctly, all the datasets are bound first, then split by year.

library(dplyr)

bind_rows(mydatalist) %>% 
  split(f = as.factor(.$year))

But I don't have enough space in my computer memory to join all my actual datasets

What I want to do is first split all the datasets by year, then join them back together.

I want something that does this internally:

$[[1]]
  ID year Weight
1  1 2011     50

$[[2]]
  ID year Weight
1  2 2012     51

$[[3]]
  ID year Weight
1  3 2013     52

$[[4]]
  ID year Weight
1  4 2014     53

$[[5]]
  ID year Weight
5  5 2015     54

$[[6]]
  ID year Weight
1  6 2011     52
2  9 2011     55

$[[7]]
  ID year Weight
1  7 2012     53

$[[8]]
  ID year Weight
1  8 2013     54

$[[9]]
  ID year Weight
1 10 2014     56

......... etc

before getting this format:

$`2011`
   ID year Weight
1   1 2011     50
6   6 2011     52
9   9 2011     55
11 11 2011     61
14 14 2011     64

$`2012`
   ID year Weight
2   2 2012     51
7   7 2012     53
12 12 2012     62
16 16 2012     70

$`2013`
   ID year Weight
3   3 2013     52
8   8 2013     54
13 13 2013     63
17 17 2013     71
20 20 2013     74

$`2014`
   ID year Weight
4   4 2014     53
10 10 2014     56
18 18 2014     72

$`2015`
   ID year Weight
5   5 2015     54
15 15 2015     65
19 19 2015     73

Solution

Here's an option using purrr:

If you want to do this in intermediate steps, you can just assign parts of the chained (%>%) operations to a variable, e.g.:

step1 <- mydatalist %>% 
  map(~split(.x, .x$year)) 

step2 <- step1 %>% 
  list_flatten(name_spec = "{inner}") %>% 
  list_rbind() %>% 
  split(.$year)

library(tidyverse)
library(purrr)

mydata1<-data.frame(ID=1:5, year=c(2011:2015),Weight=c(50:54))
mydata2<-data.frame(ID=6:10, year=c(2011:2013,2011,2014),Weight=c(52:56))
mydata3<-data.frame(ID=11:15, year=c(2011:2013,2011,2015),Weight=c(61:65))
mydata4<-data.frame(ID=16:20, year=c(2012:2014,2015,2013),Weight=c(70:74))

mydatalist<-list(mydata1=mydata1,
                 mydata2=mydata2,
                 mydata3=mydata3,
                 mydata4=mydata4)



mydatalist %>% 
  map(~split(.x, .x$year)) %>% 
  list_flatten(name_spec = "{inner}") %>% 
  list_rbind() %>% 
  split(.$year)

#> $`2011`
#>    ID year Weight
#> 1   1 2011     50
#> 6   6 2011     52
#> 7   9 2011     55
#> 11 11 2011     61
#> 12 14 2011     64
#> 
#> $`2012`
#>    ID year Weight
#> 2   2 2012     51
#> 8   7 2012     53
#> 13 12 2012     62
#> 16 16 2012     70
#> 
#> $`2013`
#>    ID year Weight
#> 3   3 2013     52
#> 9   8 2013     54
#> 14 13 2013     63
#> 17 17 2013     71
#> 18 20 2013     74
#> 
#> $`2014`
#>    ID year Weight
#> 4   4 2014     53
#> 10 10 2014     56
#> 19 18 2014     72
#> 
#> $`2015`
#>    ID year Weight
#> 5   5 2015     54
#> 15 15 2015     65
#> 20 19 2015     73

^{Created on 2023-04-15 by the reprex package (v2.0.1)}