Search code examples
rlistdataframedplyr

Turn a list with elements of unequal length into a two column dataframe


I have a list with 3 elements, each with a different set, and number, of values. I would like to turn this list into a simple two column dataframe.

One column would be the value from the list element, the second column would be the name of the list element itself.

myList <- list(A = c(1,2,3),
               B = c(10,20,30,40),
               C = c(100,200,300,400,500))

So the ideal outcome is something like:

Value     List
1         A
2         A
10        B
100       C
......

So I know I can do this with a series of rbinds:

df <-   data.frame(Value = myList[[A]],cluster = A) %>%
  rbind(data.frame(Value = myList[[B]],cluster = B)) %>%
  rbind(data.frame(Value = myList[[C]],cluster = C))

And I can probably clean this up with a loop or lapply...but it seems like there should be a more straightforward way to get this!


Solution

  • If you want to use tidyverse (not sure it can be done just with dplyr), you can use

    library(magrittr)
    tibble::enframe(myList) %>% tidyr::unnest(cols = value)
    

    output

    # A tibble: 12 x 2
       name  value
       <chr> <dbl>
     1 A         1
     2 A         2
     3 A         3
     4 B        10
     5 B        20
     6 B        30
     7 B        40
     8 C       100
     9 C       200
    10 C       300
    11 C       400
    12 C       500
    

    First, tibble::enframe(myList) will return a tibble with two columns and three rows. Column name will be the name of each element in your original list, and value will itself be the data.frames each containing a column with the values in each list.

    Then, tidyr::unnest(cols = value) just unnests the value column.


    That said, I do encourage you to consider @akrun's answer as utils::stack(myList) is considerably faster, and less verbose.

    (edited to add @Martin Gal's approach using purrr)

    microbenchmark::microbenchmark(
       tidyverse = tibble::enframe(myList) %>% tidyr::unnest(cols = value),
       baseR = utils::stack(myList),
       purrr = purrr::map_df(myList, ~data.frame(value = .x), .id = "id"),
       times = 10000
    )
    

    output

    Unit: microseconds
         expr      min       lq      mean    median        uq       max neval
     tidyverse 1937.067 2169.251 2600.4402 2301.1385 2592.7305 77715.238 10000
         baseR  144.218  182.112  227.6124  202.0755  230.0960  5476.169 10000
         purrr  350.265  417.803  523.7954  455.4410  520.3555 71673.820 10000