I have two datasets which I want to merge :
df1 <- data.frame( title =
c("residence mozart",
"les hesperides auteuil mirabeau",
"chaillot",
"jouvenet",
"retraite dosne"))
df2 <- data.frame(title = c("terrasses mozart", "chaillot",
"villa jules janin", "retraites dosne"))
And I would like to have something like this :
1 residence mozart NA (or terrasses mozart)
2 les hesperides auteuil mirabeau NA
3 chaillot chaillot
4 jouvenet NA
5 retraite dosne retraites dosne
Here is what I did :
x = data.frame(title_df2 = matrix(ncol = 1, nrow = nrow(df1)))
for (i in nbr){
x[i, ] <- grep(df1$title[i], df2$title, value = T)
}
It does not work at all ! Even though grep(df1$title[5], df2$title, value = T)
works and return "chaillot"!
If I understand correctly
df1 <- data.frame( title =
c("residence mozart",
"les hesperides auteuil mirabeau",
"chaillot",
"jouvenet",
"retraite dosne"))
df2 <- data.frame(title = c("terrasses mozart", "chaillot",
"villa jules janin", "retraites dosne"))
library(dplyr)
library(fuzzyjoin)
stringdist_left_join(x = df1, y = df2, method = "jw", distance_col = "d") %>%
filter(d < 0.25) %>%
right_join(df1, by = c("title.x" = "title"))
#> Joining by: "title"
#> title.x title.y d
#> 1 residence mozart terrasses mozart 0.23863636
#> 2 chaillot chaillot 0.00000000
#> 3 retraite dosne retraites dosne 0.09206349
#> 4 les hesperides auteuil mirabeau <NA> NA
#> 5 jouvenet <NA> NA
Created on 2021-04-19 by the reprex package (v2.0.0)