Search code examples
rtidyversepurrr

Convert a list into a tibble with nested columns


I would like to convert a list like this into a tibble.

lst <- list(
  "A"=list(
    "Category"="A",
    "Team"=c("x"),
    "City"="N"
  ),
  "C"=list(
    "Category"="C",
    "Team"=c(
      "x","v","z"
    ),
    "City"="V"
  ),
  "F"=list(
    "Category"="F",
    "Team"=c("o")
  ),
  "K"=list(
    "Category"="K",
    "Team"=c(
      "f","h","y"
    ),
    "City"="T"
  ),
  "B"=list(
    "Category"="B",
    "City"=c("Q")
  )
)

There can be multiple items in the list. I have only three in this example: Category, Team and City. Category will always have only one unique value without duplication while the other fields can be one, many or missing.

I need one unique Category per row. Any other column that has multiple values for a category must be come a nested list.

This is the output I am looking for:

  Category Team      City 
  <chr>    <list>    <chr>
1 A        <chr [1]> N    
2 C        <chr [3]> V    
3 F        <chr [1]> NA   
4 K        <chr [3]> T    
5 B        <chr [1]> Q  

This was manually created by

library(purrr)
library(dplyr)

d <- map_dfr(lst,~as_tibble(.x))
reframe(d,across(Team,list),.by=Category) |>
  left_join(distinct(select(d,-Team),Category,.keep_all=TRUE), by="Category")

I am looking for a solution that would work for any number of fields/columns and any number of items in any column (except Category which will have only one item).

One of the many failed attempts:

map_dfr(lst,~as_tibble(.x)) |>
  group_by(Category) |> 
  summarise(
    across(everything(), ~ {
      if (n_distinct(.) == 1) {
        first(.)
      } else {
        list(.)
      }
    })
  )

Error in `summarise()`:
ℹ In argument: `across(...)`.
Caused by error:
! `Team` must return compatible vectors across groups.
ℹ Result of type <character> for group A: `Category = "A"`.
ℹ Result of type <list> for group C: `Category = "C"`.
Run `rlang::last_trace()` to see where the error occurred.

Session

R version 4.4.1 (2024-06-14)
Platform: x86_64-pc-linux-gnu
Ubuntu 22.04.4 LTS
dplyr_1.1.4
tidyr_1.3.1
purrr_1.0.2

Solution

  • An approach using unnest_wider

    library(dplyr)
    library(tidyr)
    
    tibble(lst) %>% unnest_wider(lst)
    # A tibble: 5 × 3
      Category Team      City 
      <chr>    <list>    <chr>
    1 A        <chr [1]> N    
    2 C        <chr [3]> V    
    3 F        <chr [1]> NA   
    4 K        <chr [3]> T    
    5 B        <NULL>    Q
    

    Note that the non-existing Team is kept as NULL.