Search code examples
rstringdist

Using stringdist_join with differing column names


I have example data as follows:

library(fuzzyjoin)
a <- data.frame(x = c("season", "season", "season", "package", "package"), y = c("1","2", "3", "1","6"))


b <- data.frame(x = c("season", "seson", "seson", "package", "pakkage"), w = c("1","2", "3", "2","6"))

c <- data.frame(z = c("season", "seson", "seson", "package", "pakkage"), w = c("1","2", "3", "2","6"))

So the following runs fine:

d <- stringdist_left_join(a,b, by = "x", max_dist = 2)

But merging with a column with a different name is not allowed (note that the join is now a and c).

e <- stringdist_left_join(a,c, by = c("x", "z"), max_dist = 2)

I would like to tell stringdist_left_join to use two different column names to join by, like the last line of code it (e), but it does not seems to accept it.

Is there any solution to this (other than copying the column and giving it another name)?


Solution

  • You can use = for two different column names. You can use the following code:

    e <- stringdist_left_join(a,c, by = c("x" = "z"), max_dist = 2)
    

    Output:

             x y       z w
    1   season 1  season 1
    2   season 1   seson 2
    3   season 1   seson 3
    4   season 2  season 1
    5   season 2   seson 2
    6   season 2   seson 3
    7   season 3  season 1
    8   season 3   seson 2
    9   season 3   seson 3
    10 package 1 package 2
    11 package 1 pakkage 6
    12 package 6 package 2
    13 package 6 pakkage 6