Search code examples
rmergeconcatenation

Merge where some ids are concatenated in single column


I have a dataframe with a column of ids, but for some rows there are multiple ids concatenated together. I want to merge this onto another dataframe using the id, and when the ids are concatenated it handles that and reflects it by having the values in the new columns added also concatenated.

For example I have dataframes

data <- data.frame(
    id = c(1, 4, 3, "2,3", "1,4"),
    value = c(1:5)
)
> data
   id value
1   1     1
2   4     2
3   3     3
4 2,3     4
5 1,4     5

mapping <- data.frame(
    id = 1:4,
    name = c("one", "two", "three", "four")
)
> mapping
  id  name
1  1   one
2  2   two
3  3 three
4  4  four

I would like to end up with

   id value      name
1   1     1       one
2   4     2      four
3   3     3     three
4 2,3     4 two,three
5 1,4     5  one,four

Solution

  • I don't think there's a good way to do this other than to separate, join, and re-concatenate:

    library(dplyr)
    library(tidyr)
    data %>%
      mutate(true_id = row_number()) %>%
      separate_rows(id, convert = TRUE) %>%
      left_join(mapping, by = "id") %>%
      group_by(true_id, value) %>%
      summarize(id = toString(id), name = toString(name), .groups = "drop")
    # # A tibble: 5 × 4
    #   true_id value id    name      
    #     <int> <int> <chr> <chr>     
    # 1       1     1 1     one       
    # 2       2     2 4     four      
    # 3       3     3 3     three     
    # 4       4     4 2, 3  two, three
    # 5       5     5 1, 4  one, four 
    

    I wasn't sure if your value column would actually be unique, so I added a true_id just in case.