Search code examples
rlistdataframeunique

Convert dataframe with A and B columns into list of As with list of unique B values


I have a large data.frame like this:

+--------+---------+
| A      | B       |
+--------+---------+
| USA    | Chicago |
+--------+---------+
| USA    | Chicago |
+--------+---------+
| France | Paris   |
+--------+---------+
| Italy  | Rome    |
+--------+---------+
| France | Nice    |
+--------+---------+
| Italy  | Venice  |
+--------+---------+

ie

AB <- structure(list(A = c("USA", "France", "Italy", "France", "Italy", 
"USA"), B = c("Chicago", "Paris", "Rome", "Nice", "Venice", "Chicago"
)), row.names = c(NA, -6L), class = "data.frame")

and I would like to create a list like this:

list(USA = list("Chicago"), France = list("Paris", "Nice"), Italy = list(
    "Rome", "Venice"))

Here's what I'm doing now.

unique.As <- unique(AB$A)
ABL <- lapply(unique.As, function(current.A) {
  return(unique(AB$B[AB$A == current.A]))
})
names(ABL) <- unique.As

Edit

I previously wrote that listifying a data.frame with 65k rows took ~ 10 minutes. I realized today that almost all of that time was from another step in the lapply loop that I didn't show above.

akrun's solution below is still faster and more elegant!


Solution

  • split should be faster

    lst1 <- split(as.list(AB$B), AB$A)
    

    If the intention is to have both 'key', 'value' unique,

    lst1 <- with(unique(AB), split(as.list(B), A))
    

    Or

    with(AB[!duplicated(AB), ], split(as.list(B), A))