This is a two-part question. I would like to join/ merge two data frames into one. They each have only one column but different numbers of rows. Some observations match; others do not.
df1 <- c("John", "Tommy", "Linda", "Cole")
df2 <- c("Tom", "John", "Bruce")
as.data.frame(df1)
as.data.frame(df2)
After joining them together, I need to put matching observations on the same row. I.e., John(df1) will be on the same row as John(df2), and same with Tommy(df1) and Tom(df2). This may involve fuzzy match as variations of names (Tom/ Tommy) will need to be matched.
If there is an observation in df1 that does not exist in df2 (like Linda), I would like a NA in that row for df2. Vice-versa for Bruce.
Desired output:
df1 | df2 |
---|---|
John | John |
Tommy | Tom |
Linda | NA |
Cole | NA |
NA | Bruce |
Please advise. I have tried variations of functions merge, match, join etc but without success. Thanks in advance!
Use fuzzyjoin
:
df1 <- data.frame(df1 = c("John", "Tommy", "Linda", "Cole"))
df2 <- data.frame(df2 = c("Tom", "John", "Bruce"))
fuzzyjoin::stringdist_full_join(df1, df2, by = c(df1 = "df2"))
# df1 df2
# 1 John John
# 2 Tommy Tom
# 3 Linda <NA>
# 4 Cole <NA>
# 5 <NA> Bruce