Search code examples
rdplyrmerge

R joining data frame by index in a string r


I'm trying to merge/join the two dataframes df and df2.

df was generated from a character vector that was cut at each position (4th, 10th, 12th, ... character) into string. The initial vector looked like this

vec1 <- paste(df$string, collapse = "").

df2 has name that correspond to some of the characters in vec1. For instance, the third and fifth characters in vec1 are P and A, and their names are pear and apple, respectively.

df <- data.frame("position" = c(4, 10, 12, 20, 27, 30),
             "string" = c("MPPA", "APARLA", "LA", "LGLGLWLG", "ALAGGPG", "RGC"))
df2 <- data.frame("character" = c("P", "A", "L", "A", "P", "G"),
              "position" = c(3, 5, 9, 21, 26, 29),
              "name" = c("pear", "apple", "lemon", "apricot", "peach", "grape"))

I want to combine df and df2 into df3 to show which string in df has name, like below. What would be a good way to do this?

df3 <- data.frame("position" = c(4, 10, 12, 20, 27, 30),
              "string" = c("MPPA", "APARLA", "LA", "LGLGLWLG", "ALAGGPG", "RGC"),
              "name" = c("pear", "apple, lemon", NA, NA, "apricot, peach", "grape"))

Solution

  • Here, I join df to a version of itself where the characters have been exploded into their individual position and joined to df2, summarizing to keep the matches in one combined string per position.

    library(tidyverse)
    df |>
      left_join(
        df |>
          separate_longer_position(string, 1) |>
          mutate(pos = row_number()) |>
          left_join(df2, join_by(pos == position, string == character)) |>
          filter(!is.na(name)) |>
          summarize(name = paste(name, collapse = ","), .by = position))
    

    Result

    Joining with `by = join_by(position)`
      position   string          name
    1        4     MPPA          pear
    2       10   APARLA   apple,lemon
    3       12       LA          <NA>
    4       20 LGLGLWLG          <NA>
    5       27  ALAGGPG apricot,peach
    6       30      RGC         grape