Here is an example of my data
id address
Table1:User table
id address
1 mont carlo road,CA
2 mont road,IS
3 mont carlo road1-11,CA
Table 2(The output I wanna get)
Similarity Matrix
id 1 2 3
1
2 3
3 1 3
1~3 very similar~very dissimilar
My problem is how to recognize the similarity between the case by address in the Table 1, and then output a result, say Similarity Matrix like Table 2 in R. The point is how to figure out the comparison between two sentences in R and then set a scale to measure the similarity between a pair, finally output a matrix.
I'd also use the stringdist
package but would make use of outer
and cut
to finish the job:
library(stringdist)
dat <- data.frame(
address = c("mont carlo road,CA", "mont road,IS", "mont carlo road1-11,CA"),
id = 1:3
)
m <- outer(dat[["address"]], dat[["address"]], stringdist, method="jw")
m[lower.tri(m)] <- cut(m[lower.tri(m)], 3, labels=1:3)
m[upper.tri(m)] <- cut(m[upper.tri(m)], 3, labels=1:3)
dimnames(m) <- list(dat[["id"]], dat[["id"]])
diag(m) <- NA
m
## 1 2 3
## 1 NA 3 1
## 2 3 NA 3
## 3 1 3 NA
You can use whatever method you want for calculating distance (?stringdist
).