I have two vectors, like
v1<-c("yellow", "red", "orange", "blue", "green")
v2<-c("blues", "redx", "grean")
and I want to match them, i.e., to "link" each element of v1
with the most similar element on v2
, so that the result is
> df
v1 v2
1 yellow <NA>
2 red redx
3 orange <NA>
4 blue blues
5 green grean
The following code gives the expected result, but just because it has manually "formatted" to do so
df<-data.frame(v1,v2=rep(NA,5))
for (i in 1:nrow(df)) {
ag<-agrep(df[i,1], v2, ignore.case = T, value = T)
if (length(ag)==0) {df[i,2]<-NA}
else if (length(ag)==1) {df[i,2]<-ag}
else {df[i,2]<-ag[1]}
}
It happens that agrep(df[2,1], v2, max.distance = 0.00001, ignore.case = T, value = T)
results in "redx" "grean"
, even if I set max.distance = 0.00001
.
That's why I have the if conditions, but it doesn't guarantee that the most similar answer is selected.
How can I overcome this issue?
Thank you in advance
You could try:
s <- which(adist(v1,v2) <= 1, TRUE) # 1 is the maximum allowed change
data.frame(v1, v2=replace(NA, s[,1], v2[s[,2]]))
v1 v2
1 yellow <NA>
2 red redx
3 orange <NA>
4 blue blues
5 green grean