Search code examples
rlistdataframenested-listsrbind

Convert nested list with different names to data.frame filling NA and adding column


I need a base R solution to convert nested list with different names to a data.frame

mylist <- list(list(a=1,b=2), list(a=3), list(b=5), list(a=9, z=list('k'))

convert(mylist)
## returns a data.frame:
##
##     a     b    z           
##     1     2    <NULL>   
##     3    NA    <NULL>   
##    NA     5    <NULL>   
##     9    NA    <chr [1]>

I know this could be easily done with dplyr::bind_rows or data.table::rbindlist with fill = TRUE (not ideal though since it fills character column with NULL, not NA), but I do really need a solution in base R. To simplify the problem, it is also fine with a 2-level nested list that has no 3rd level lists such as

mylist <- list(list(a=1,b=2), list(a=3), list(b=5), list(a=9, z='k'))

convert(mylist)
## returns a data.frame:
##
##     a     b    z           
##     1     2    NA   
##     3    NA    NA   
##    NA     5    NA   
##     9    NA    k  

I have tried something like

convert <- function(L) as.data.frame(do.call(rbind, L))

This does not fill NA and add additional column z

Update

mylist here is just a simplified example. In reality I could not assume the names of the sublist elements (a, b and z in the example), nor the sublists lengths (2, 1, 1, 2 in the example).

Here are the assumptions for expected data.frame and the input mylist:

  1. The column number of the expected data.frame is determined by the maximum length of the sublists which could vary from 1 to several hundreds. There is no explicit source of information about the length of each sublist (which names will appear or disappear in which sublist is unknown) max(sapply(mylist, length)) <= 1000 ## ==> TRUE
  2. The row number of the expected data.frame is determined by the length of mylist which could vary from 1 to several thousands dplyr::between(length(mylist), 0, 10000) ## ==> TRUE
  3. No explicit information for the names of the sublist elements and their orders, therefore the column names and order of the expected data.frame can only be determined intrinsically from mylist
  4. Each sublist contains elements in types of numeric, character or list. To simplify the problem, consider only numeric and character.

Solution

  • A shorter solution in base R would be

    make_df <- function(a = NA, b = NA, z = NA) {
      data.frame(a = unlist(a), b = unlist(b), z = unlist(z))
    }
    
    do.call(rbind, lapply(mylist, function(x) do.call(make_df, x)))
    #>    a  b    z
    #> 1  1  2 <NA>
    #> 2  3 NA <NA>
    #> 3 NA  5 <NA>
    #> 4  9 NA    k
    

    Update

    A more general solution using the same method, but which does not require specific names would be:

    build_data_frame <- function(obj) {
      nms     <- unique(unlist(lapply(obj, names)))
      frmls   <- as.list(setNames(rep(NA, length(nms)), nms))
      dflst   <- setNames(lapply(nms, function(x) call("unlist", as.symbol(x))), nms)
      make_df <- as.function(c(frmls, call("do.call", "data.frame", dflst)))
      
      do.call(rbind, lapply(mylist, function(x) do.call(make_df, x)))
    }
    

    This allows

    build_data_frame(mylist)
    #>    a  b    z
    #> 1  1  2 <NA>
    #> 2  3 NA <NA>
    #> 3 NA  5 <NA>
    #> 4  9 NA    k