Search code examples
rdplyrpurrrnested-lists

Binding a nested list , with empty elements, duplicate names and inconsistent structure into one tibble/ R


I have a large nested list, that contains other lists, that contain tibbles. After cleaning the elements I am left with a lot of empty tibbles and lists.

Now I want to bring my results into one single Dataframe but I get the error message :

Argument x must have names.

I am aware that the empty elements screw up my attemps to bind rows, but I can not find a method to drop all empty tibbles.

This is the structure of my data:

a<-tibble (1:7,
        letters[7:1])

b<-tibble (1:7)

c<-tibble(.rows=2)




riddle<-list(list(a,b,c), list(list(a,b,c)), list(c),c)

bind_rows(riddle)

Please note that this is just a minimized version. So any manual editing or deleting of elements wont work on my original data.

Any solution with purrr would be extra useful! :)

Thanks for any help in advance!!


EDIT:

The answer from @akrun solved the upper problem, but I found out that the main issue of my data is the structure of variables with same names. A common type of character should work.

I've reproduced the error in a slightly changed example:

a<-tibble (a=numeric(7),
           b=letters[7:1],
           c=integer(length=1))

b<-tibble (a=integer(length=1),
           b=as.numeric(8),
           c=letters[7:1])


c<- tibble(.rows = 2)




riddle<-list(list(a,b,c), list(list(a,b,c)), list(b,c),c)


map_dfr(riddle, bind_rows)
Error in `.f()`:
! Can't combine `..1$b` <character> and `..2$b` <double>.

Thanks for any ideas


EDIT:

Another necessary step was the implementation of make.unique (), as my original data contains duplicate column names, which can not be merged with row.bind, explained and answered by @akrun below.


Solution

  • If we want to get a single dataset, loop over the list with map and then use bind_rows

    library(purrr)
    library(dplyr)
    map_dfr(riddle, bind_rows)
    

    If the intention is to remove the datasets that have 0 rows or 0 columns, use a recursive function to check if all the dim attributes have values greater than 0

    library(rrapply)
    riddle2 <- rrapply(riddle, condition = function(x) all(dim(x)>0), 
         classes = "data.frame", how= "prune")
    

    -compare the structure

    #riddle2
    > str(riddle2)
    List of 2
     $ :List of 2
      ..$ : tibble [7 × 2] (S3: tbl_df/tbl/data.frame)
      .. ..$ 1:7         : int [1:7] 1 2 3 4 5 6 7
      .. ..$ letters[7:1]: chr [1:7] "g" "f" "e" "d" ...
      ..$ : tibble [7 × 1] (S3: tbl_df/tbl/data.frame)
      .. ..$ 1:7: int [1:7] 1 2 3 4 5 6 7
     $ :List of 1
      ..$ :List of 2
      .. ..$ : tibble [7 × 2] (S3: tbl_df/tbl/data.frame)
      .. .. ..$ 1:7         : int [1:7] 1 2 3 4 5 6 7
      .. .. ..$ letters[7:1]: chr [1:7] "g" "f" "e" "d" ...
      .. ..$ : tibble [7 × 1] (S3: tbl_df/tbl/data.frame)
      .. .. ..$ 1:7: int [1:7] 1 2 3 4 5 6 7
    
    #riddle
    > str(riddle)
    List of 4
     $ :List of 3
      ..$ : tibble [7 × 2] (S3: tbl_df/tbl/data.frame)
      .. ..$ 1:7         : int [1:7] 1 2 3 4 5 6 7
      .. ..$ letters[7:1]: chr [1:7] "g" "f" "e" "d" ...
      ..$ : tibble [7 × 1] (S3: tbl_df/tbl/data.frame)
      .. ..$ 1:7: int [1:7] 1 2 3 4 5 6 7
      ..$ : tibble [2 × 0] (S3: tbl_df/tbl/data.frame)
     Named list()
     $ :List of 1
      ..$ :List of 3
      .. ..$ : tibble [7 × 2] (S3: tbl_df/tbl/data.frame)
      .. .. ..$ 1:7         : int [1:7] 1 2 3 4 5 6 7
      .. .. ..$ letters[7:1]: chr [1:7] "g" "f" "e" "d" ...
      .. ..$ : tibble [7 × 1] (S3: tbl_df/tbl/data.frame)
      .. .. ..$ 1:7: int [1:7] 1 2 3 4 5 6 7
      .. ..$ : tibble [2 × 0] (S3: tbl_df/tbl/data.frame)
     Named list()
     $ :List of 1
      ..$ : tibble [2 × 0] (S3: tbl_df/tbl/data.frame)
     Named list()
     $ : tibble [2 × 0] (S3: tbl_df/tbl/data.frame)
     Named list()
    

    For the updated version with different types, we can convert it to single type and then use type.convert after binding

    rrapply(riddle, condition = function(x) all(dim(x)>0),
      f =  function(x) x %>% 
      mutate(across(everything(), as.character)),
         classes = "data.frame", how= "flatten") %>%
      bind_rows %>%
      type.convert(as.is = TRUE)
    

    -output

    # A tibble: 35 × 3
           a b     c    
       <int> <chr> <chr>
     1     0 g     0    
     2     0 f     0    
     3     0 e     0    
     4     0 d     0    
     5     0 c     0    
     6     0 b     0    
     7     0 a     0    
     8     0 8     g    
     9     0 8     f    
    10     0 8     e    
    # … with 25 more rows
    

    If there are duplicate column names, we can make it unique with make.unique as data.frame needs unique column names. Also, some functions can return Error when there are duplicate column names

    rrapply(riddle, condition = function(x) all(dim(x)>0),  
        f =  function(x) 
          {
         # change to unique column names
         names(x) <- make.unique(names(x))
       x %>%  
            # convert all columns to character if there
            # are mismatch in column types in any list elements
            mutate(across(everything(), as.character))
         },      classes = "data.frame", how= "flatten") %>% 
         # bind the flattened list of data.frame/tibbles to single dataset
        bind_rows %>%
         # do the column type conversion 
         type.convert(as.is = TRUE)