Search code examples
rdataframelistsplitbind

How to split dataframes of a list then join them based on a variable


I have scattered data sets with different observations from different years in a list I want to group observations from the same years in the same datasets.

mydata1<-data.frame(ID=1:5, year=c(2011:2015),Weight=c(50:54))
mydata2<-data.frame(ID=6:10, year=c(2011:2013,2011,2014),Weight=c(52:56))
mydata3<-data.frame(ID=11:15, year=c(2011:2013,2011,2015),Weight=c(61:65))
mydata4<-data.frame(ID=16:20, year=c(2012:2014,2015,2013),Weight=c(70:74))

mydatalist<-list(mydata1=mydata1,
     mydata2=mydata2,
     mydata3=mydata3,
     mydata4=mydata4)

Normally, the code below allows me to do this, but that's not what I want. If I understand this method correctly, all the datasets are bound first, then split by year.

library(dplyr)

bind_rows(mydatalist) %>% 
  split(f = as.factor(.$year))

But I don't have enough space in my computer memory to join all my actual datasets

What I want to do is first split all the datasets by year, then join them back together.

I want something that does this internally:

$[[1]]
  ID year Weight
1  1 2011     50

$[[2]]
  ID year Weight
1  2 2012     51

$[[3]]
  ID year Weight
1  3 2013     52

$[[4]]
  ID year Weight
1  4 2014     53

$[[5]]
  ID year Weight
5  5 2015     54

$[[6]]
  ID year Weight
1  6 2011     52
2  9 2011     55

$[[7]]
  ID year Weight
1  7 2012     53

$[[8]]
  ID year Weight
1  8 2013     54

$[[9]]
  ID year Weight
1 10 2014     56

......... etc

before getting this format:

$`2011`
   ID year Weight
1   1 2011     50
6   6 2011     52
9   9 2011     55
11 11 2011     61
14 14 2011     64

$`2012`
   ID year Weight
2   2 2012     51
7   7 2012     53
12 12 2012     62
16 16 2012     70

$`2013`
   ID year Weight
3   3 2013     52
8   8 2013     54
13 13 2013     63
17 17 2013     71
20 20 2013     74

$`2014`
   ID year Weight
4   4 2014     53
10 10 2014     56
18 18 2014     72

$`2015`
   ID year Weight
5   5 2015     54
15 15 2015     65
19 19 2015     73

Solution

  • Here's an option using purrr:

    If you want to do this in intermediate steps, you can just assign parts of the chained (%>%) operations to a variable, e.g.:

    step1 <- mydatalist %>% 
      map(~split(.x, .x$year)) 
    
    step2 <- step1 %>% 
      list_flatten(name_spec = "{inner}") %>% 
      list_rbind() %>% 
      split(.$year)
    
    library(tidyverse)
    library(purrr)
    
    mydata1<-data.frame(ID=1:5, year=c(2011:2015),Weight=c(50:54))
    mydata2<-data.frame(ID=6:10, year=c(2011:2013,2011,2014),Weight=c(52:56))
    mydata3<-data.frame(ID=11:15, year=c(2011:2013,2011,2015),Weight=c(61:65))
    mydata4<-data.frame(ID=16:20, year=c(2012:2014,2015,2013),Weight=c(70:74))
    
    mydatalist<-list(mydata1=mydata1,
                     mydata2=mydata2,
                     mydata3=mydata3,
                     mydata4=mydata4)
    
    
    
    mydatalist %>% 
      map(~split(.x, .x$year)) %>% 
      list_flatten(name_spec = "{inner}") %>% 
      list_rbind() %>% 
      split(.$year)
    
    #> $`2011`
    #>    ID year Weight
    #> 1   1 2011     50
    #> 6   6 2011     52
    #> 7   9 2011     55
    #> 11 11 2011     61
    #> 12 14 2011     64
    #> 
    #> $`2012`
    #>    ID year Weight
    #> 2   2 2012     51
    #> 8   7 2012     53
    #> 13 12 2012     62
    #> 16 16 2012     70
    #> 
    #> $`2013`
    #>    ID year Weight
    #> 3   3 2013     52
    #> 9   8 2013     54
    #> 14 13 2013     63
    #> 17 17 2013     71
    #> 18 20 2013     74
    #> 
    #> $`2014`
    #>    ID year Weight
    #> 4   4 2014     53
    #> 10 10 2014     56
    #> 19 18 2014     72
    #> 
    #> $`2015`
    #>    ID year Weight
    #> 5   5 2015     54
    #> 15 15 2015     65
    #> 20 19 2015     73
    

    Created on 2023-04-15 by the reprex package (v2.0.1)