Search code examples
exceljoinmatchformulavlookup

Comparing similar words in two different columns


I have list of different categories in same dataset in two different columns.

Example data how the comparison should be:

enter image description here

There are some similar words in the two columns. So by comparing the two columns the maximum matched wordings should be created in new column as shown in attached image. Is their any formula that we can compare two columns.

I tried to use vlookup the cases but in that only full matched columns are joining. I need matching words to join


Solution

  • This VBA solution kind of works I guess but is definitely not 100% reliable:

    Sub matchWords()
        Dim arr() As Variant, rng As Range, lastRow As Long, lastColumn As Long
        
        With ActiveSheet
            lastColumn = 3
            lastRow = .Cells(.Rows.Count, 1).End(xlUp).Row
        End With
        
        Set rng = Range(Cells(2, 1), Cells(lastRow, lastColumn))
        arr = rng
        
        
        Dim val1 As String, val2 As String
        For i = LBound(arr) To UBound(arr)
            arr(i, 3) = ""
            For j = LBound(arr) To UBound(arr)
                val1 = LCase(arr(i, 2))
                If val1 <> "" Then val1 = Split(val1, ".")(UBound(Split(val1, ".")))
                val2 = LCase(Left(Replace(arr(j, 1), " ", ""), Int(Len(Replace(arr(j, 1), " ", "")) * 0.7)))
                If InStr(1, val1, val2) <> 0 Then
                    arr(i, 3) = arr(j, 1) & "?" & arr(i, 3)
                End If
            Next j
            If val1 <> "" And arr(i, 3) <> "" Then arr(i, 3) = Left(arr(i, 3), Len(arr(i, 3)) - 1)
        Next i
        rng = arr
    End Sub
    

    Tested on this table called "data". Sometimes, there is more than one match in which case both of them are put into the third column, separated by a question mark:

    cat1 cat2 matchedCat1
    anesthesiologist professional. Accountants accounting
    appliances nightlife.adultentertainment adult entertainment
    aircraft dealer professional.advertising advertising
    airline industry.agriculture agriculture
    animal shelter auto.aircraftdealers aircraft dealer
    apartments hotelstravel.transport.airlines airline
    airport hotelstravel.airports airport
    alternative medicine hotelstravel.airports.airportterminals airport terminal?airport
    amusement park health.physicians.allergist allergist
    airport terminal health.alternativemedicine alternative medicine
    allergist active.amusementparks amusement park
    accounting health.physicians.anesthesiologists anesthesiologist
    advertising pets.animalshelters animal shelter
    agriculture realestate.apartments apartments
    adult entertainment shopping.homeandgarden.appliances appliances

    *just in case any moderator comes across this: The reason the table looks like this is because SO will not allow me to post it without each of the entries formatted as code. This "feature" seriously needs to either be fixed or done away with completely.