Search code examples
rdataframe

R Merge two dataframes with same columns


Imagine you have two dataframes in R, for example. Imagine that columns could have any kind of names, not always a or b. Also there could be much more columns, dozens or even hundreds.

a <-
  data.frame(
    id = seq(1, 4),
    a = c(0, 0, 1, 1),
    b = c(3, 3, 1, 1)
  )


b <-
  data.frame(
    id = seq(1, 4),
    a = c("0%", "0%", "50%", "50%"),
    b = c("37.5%", "37.5%", "12.5%", "12.5%")
  )

I'd like to merge it to following result:

result <-
  data.frame(
    id = seq(1, 4),
    a = c("0 (0%)", "0 (0%)", "1 (50%)", "1 (50%)"),
    b = c("3 (37.5%)", "3 (37.5%)", "1 (12.5%)", "1 (12.5%)")
  )

Solution

  • This seems like a join with some string-concatenation:

    library(dplyr)
    full_join(a, b, by = "id") %>%
      mutate(across(ends_with(".x"), ~ sprintf("%s (%s)", .x, pick(sub("x$", "y", cur_column()))[[1]]), .names = "{sub('.x$', '', .col)}")) %>%
      select(-matches("\\.[xy]$"))
    #   id       a         b
    # 1  1  0 (0%) 3 (37.5%)
    # 2  2  0 (0%) 3 (37.5%)
    # 3  3 1 (50%) 1 (12.5%)
    # 4  4 1 (50%) 1 (12.5%)
    

    It's certainly possible to do this in base R or data.table, if needed, though with a little more code.

    tmp <- merge(a, b, by = "id")
    tmp
    #   id a.x b.x a.y   b.y
    # 1  1   0   3  0% 37.5%
    # 2  2   0   3  0% 37.5%
    # 3  3   1   1 50% 12.5%
    # 4  4   1   1 50% 12.5%
    xs <- grep("\\.x$", names(tmp))
    ys <- grep("\\.y$", names(tmp))
    Map(function(m, n) sprintf("%s (%s)", m, n), tmp[,xs], tmp[,ys]) |>
      setNames(sub(".x$", "", xs)) |>
      cbind(tmp[,-c(xs, ys), drop=FALSE])
    #         2         3 id
    # 1  0 (0%) 3 (37.5%)  1
    # 2  0 (0%) 3 (37.5%)  2
    # 3 1 (50%) 1 (12.5%)  3
    # 4 1 (50%) 1 (12.5%)  4