I'm trying to merge/join the two dataframes df
and df2
.
df
was generated from a character vector that was cut at each position
(4th, 10th, 12th, ... character) into string
. The initial vector looked like this
vec1 <- paste(df$string, collapse = "")
.
df2
has name
that correspond to some of the characters in vec1
. For instance, the third and fifth characters in vec1
are P
and A
, and their names
are pear
and apple
, respectively.
df <- data.frame("position" = c(4, 10, 12, 20, 27, 30),
"string" = c("MPPA", "APARLA", "LA", "LGLGLWLG", "ALAGGPG", "RGC"))
df2 <- data.frame("character" = c("P", "A", "L", "A", "P", "G"),
"position" = c(3, 5, 9, 21, 26, 29),
"name" = c("pear", "apple", "lemon", "apricot", "peach", "grape"))
I want to combine df
and df2
into df3
to show which string
in df
has name
, like below. What would be a good way to do this?
df3 <- data.frame("position" = c(4, 10, 12, 20, 27, 30),
"string" = c("MPPA", "APARLA", "LA", "LGLGLWLG", "ALAGGPG", "RGC"),
"name" = c("pear", "apple, lemon", NA, NA, "apricot, peach", "grape"))
Here, I join df
to a version of itself where the characters have been exploded into their individual position and joined to df2
, summarizing to keep the matches in one combined string per position.
library(tidyverse)
df |>
left_join(
df |>
separate_longer_position(string, 1) |>
mutate(pos = row_number()) |>
left_join(df2, join_by(pos == position, string == character)) |>
filter(!is.na(name)) |>
summarize(name = paste(name, collapse = ","), .by = position))
Result
Joining with `by = join_by(position)`
position string name
1 4 MPPA pear
2 10 APARLA apple,lemon
3 12 LA <NA>
4 20 LGLGLWLG <NA>
5 27 ALAGGPG apricot,peach
6 30 RGC grape