Search code examples
rregexstringrstringi

match strings exactly on lookup table in R


I have a table of lookup values with patterns to look for and replacements, but the patterns have strings containing one another and I want to match them exactly.

lookup <- tibble(
  pattern = c("ONE", "ONET", "ONETR"),
  replacement = c("one new", "this is 2", "for 3")
)
other_table <- tibble(
  strings = c(
    "I want to replace ONE",
    "Or ONET is what to change",
    "We can change ONE again",
    "ONETR also can be replaced"
  ),
  other_dat = 1:4
)

I've tried using stringi but this doesn't work when the patterns contain one another.

other_table %>%
  mutate(
    strings = stringi::stri_replace_all_fixed(
      strings, 
      pattern = lookup$pattern, 
      replacement = lookup$replacement,
      vectorize_all = FALSE)
    )

What function can I use to replace all the patterns found in in_table$strings with lookup$replacement?

Desired Output:

  strings                        other_dat
  <chr>                              <int>
1 I want to replace one new              1
2 Or this is 2 is what to change         2
3 We can change one new again            3
4 for 3 also can be replaced             4

Any help appreciated!


Solution

  • Use word-boundaries in your regex (not fixed), e.g., "\\b".

    other_table %>%
      mutate(
        strings = stringi::stri_replace_all(
          strings, 
          regex = paste0("\\b", lookup$pattern, "\\b"), 
          replacement = lookup$replacement,
          vectorize_all = FALSE)
        )
    # # A tibble: 4 x 2
    #   strings                        other_dat
    #   <chr>                              <int>
    # 1 I want to replace one new              1
    # 2 Or this is 2 is what to change         2
    # 3 We can change one new again            3
    # 4 for 3 also can be replaced             4