Suppose I have those data:
data <- tibble(
x = c("ANOTHER", "COMMON", "ZEBRA")
)
And I want to join this dataframe with this one:
EDIT: The prefix has no fixed length
selection <- tibble(
x_prefix = c("A", "B", "CO"),
type = c("One type", "Other type", "Other type")
)
I want to make an INNER JOIN with the x
and x_prefix
columns, with the idea that a line is kept if x_prefix
is a prefix of x
.
The expected answer is:
answer <- tibble(
x = c("ANOTHER", "COMMON"),
type = c("One type", "Other type")
)
How one can do that with dplyr?
You can do a fuuzy_inner_join
with str_detect
as the matching function. The symbol "^" is from regex, meaning start of a string, therefore we need to paste
it to your pattern to only match strings starting with x_prefix
.
library(fuzzyjoin)
library(tidyverse)
fuzzy_inner_join(data, selection, by = c("x" = "x_prefix"),
match_fun = \(x, y) str_detect(x, paste0("^", y))) |>
select(-x_prefix)
# A tibble: 2 x 2
x type
<chr> <chr>
1 ANOTHER One type
2 COMMON Other type