Search code examples
rlistdata-binding

Convert list of lists to single dataframe with first column filled by first value (for each list) in R


I have a list of lists, like so:

x <-list()
x[[1]] <- c('97', '342', '333')
x[[2]] <- c('97','555','556','742','888')
x[[3]] <- c ('100', '442', '443', '444', '445','446')

The first number in each list (97, 97, 100) refers to a node in a tree and the following numbers refer to traits associated with that node.

My goal is to create a dataframe that looks like this:

df= data.frame(node = c('97','97','97','97','97','97','100','100','100','100','100'),
               trait = c('342','333','555','556','742','888','442','443','444','445','446'))

where each trait has its corresponding node.

I think the first thing I need to do is convert the list of lists into a single dataframe. I've tried doing so using:

do.call(rbind,x)

but that repeats the values in x[[1]] and x[[2]] to match the length of x[[3]]. I've also tried using:

dt_list <- map(x, as.data.table)
dt <- rbindlist(dt_list, fill = TRUE, idcol = T)

Which I think gets me closer, but I'm still unsure of how to assign the first node value to the corresponding trait values. I know this is probably a simple task but it's stumping me today!


Solution

  • You can create a data frame with the first value from the vector in column 'node' and the rest of the values in column 'trait'. This strategy can be applied to all entries in the list using the map_df() function from purrr package, giving the output you describe.

    library(purrr)
    library(dplyr)
    
    x %>%
      map_df(., function(vec) data.frame(node = vec[1],
                                         trait = vec[-1], 
                                         stringsAsFactors = F))