I have list of different categories in same dataset in two different columns.
Example data how the comparison should be:
There are some similar words in the two columns. So by comparing the two columns the maximum matched wordings should be created in new column as shown in attached image. Is their any formula that we can compare two columns.
I tried to use vlookup
the cases but in that only full matched columns are joining. I need matching words to join
This VBA solution kind of works I guess but is definitely not 100% reliable:
Sub matchWords()
Dim arr() As Variant, rng As Range, lastRow As Long, lastColumn As Long
With ActiveSheet
lastColumn = 3
lastRow = .Cells(.Rows.Count, 1).End(xlUp).Row
End With
Set rng = Range(Cells(2, 1), Cells(lastRow, lastColumn))
arr = rng
Dim val1 As String, val2 As String
For i = LBound(arr) To UBound(arr)
arr(i, 3) = ""
For j = LBound(arr) To UBound(arr)
val1 = LCase(arr(i, 2))
If val1 <> "" Then val1 = Split(val1, ".")(UBound(Split(val1, ".")))
val2 = LCase(Left(Replace(arr(j, 1), " ", ""), Int(Len(Replace(arr(j, 1), " ", "")) * 0.7)))
If InStr(1, val1, val2) <> 0 Then
arr(i, 3) = arr(j, 1) & "?" & arr(i, 3)
End If
Next j
If val1 <> "" And arr(i, 3) <> "" Then arr(i, 3) = Left(arr(i, 3), Len(arr(i, 3)) - 1)
Next i
rng = arr
End Sub
Tested on this table called "data". Sometimes, there is more than one match in which case both of them are put into the third column, separated by a question mark:
cat1 |
cat2 |
matchedCat1 |
---|---|---|
anesthesiologist |
professional. Accountants |
accounting |
appliances |
nightlife.adultentertainment |
adult entertainment |
aircraft dealer |
professional.advertising |
advertising |
airline |
industry.agriculture |
agriculture |
animal shelter |
auto.aircraftdealers |
aircraft dealer |
apartments |
hotelstravel.transport.airlines |
airline |
airport |
hotelstravel.airports |
airport |
alternative medicine |
hotelstravel.airports.airportterminals |
airport terminal?airport |
amusement park |
health.physicians.allergist |
allergist |
airport terminal |
health.alternativemedicine |
alternative medicine |
allergist |
active.amusementparks |
amusement park |
accounting |
health.physicians.anesthesiologists |
anesthesiologist |
advertising |
pets.animalshelters |
animal shelter |
agriculture |
realestate.apartments |
apartments |
adult entertainment |
shopping.homeandgarden.appliances |
appliances |
*just in case any moderator comes across this: The reason the table looks like this is because SO will not allow me to post it without each of the entries formatted as code. This "feature" seriously needs to either be fixed or done away with completely.