Search code examples
rjoinmergestring-matching

Join two data frames with different dimensions and match


This is a two-part question. I would like to join/ merge two data frames into one. They each have only one column but different numbers of rows. Some observations match; others do not.

df1 <- c("John", "Tommy", "Linda", "Cole")
df2 <- c("Tom", "John", "Bruce")
as.data.frame(df1)
as.data.frame(df2)

After joining them together, I need to put matching observations on the same row. I.e., John(df1) will be on the same row as John(df2), and same with Tommy(df1) and Tom(df2). This may involve fuzzy match as variations of names (Tom/ Tommy) will need to be matched.

If there is an observation in df1 that does not exist in df2 (like Linda), I would like a NA in that row for df2. Vice-versa for Bruce.

Desired output:

df1 df2
John John
Tommy Tom
Linda NA
Cole NA
NA Bruce

Please advise. I have tried variations of functions merge, match, join etc but without success. Thanks in advance!


Solution

  • Use fuzzyjoin:

    df1 <- data.frame(df1 = c("John", "Tommy", "Linda", "Cole"))
    df2 <- data.frame(df2 = c("Tom", "John", "Bruce"))
    
    fuzzyjoin::stringdist_full_join(df1, df2, by = c(df1 = "df2"))
    
    #     df1   df2
    # 1  John  John
    # 2 Tommy   Tom
    # 3 Linda  <NA>
    # 4  Cole  <NA>
    # 5  <NA> Bruce