fuzzy_left_join with match_fun %in%

Some data

example_df <- data.frame(
  url = c('blog/blah', 'blog/?utm_medium=foo', 'blah', 'subscription/apples', 'UK/something'),
  numbs = 1:5
)

lookup_df <- data.frame(
  string = c('blog', 'subscription', 'UK'),
  group = c('blog', 'subs', 'UK')
)


library(fuzzyjoin)
data_combined <- example_df %>% 
  fuzzy_left_join(lookup_df, by = c("url" = "string"), 
                  match_fun = `%in%`)

data_combined
                   url numbs string group
1            blog/blah     1   <NA>  <NA>
2 blog/?utm_medium=foo     2   <NA>  <NA>
3                 blah     3   <NA>  <NA>
4  subscription/apples     4   <NA>  <NA>
5         UK/something     5   <NA>  <NA>

I expected data_combined to have values for string and group where there's a match based on match_fun. Instead all NA.

Example, the first value of string in lookup_df is 'blog'. Since this is %in% the first value of example_df string, expected a match with value 'blog' and 'blog' in string and group fields.

Solution

If we want to do a partial match with the word before the / in the 'url' with the 'string' column in 'lookup_df', we could extract that substring as a new column and then do a regex_left_join

library(dplyr)
library(fuzzyjoin)
library(stringr)
example_df %>%
    mutate(string = str_remove(url, "\\/.*")) %>% 
    regex_left_join(lookup_df, by = 'string') %>%
    select(url, numbs, group)

-output

#                   url numbs group
#1            blog/blah     1  blog
#2 blog/?utm_medium=foo     2  blog
#3                 blah     3  <NA>
#4  subscription/apples     4  subs
#5         UK/something     5    UK