Search code examples
excelvbasimilarity

I want to check for similarity for text in corresponding cells in excel


I have a data set in my excel sheet, The data in each cell is a set of numbers separated by ";".

Below is my data set

The expected result is in 2nd cell all the four numbers are there in both column (G and H), but not in same order. In the next row, the order is same. So is there any way to check the similarity

I have tried using below code but it only seem to highlight (red) first few characters

This is the output of my code

If anyone wants to see the file then click here

The code is :

Sub highlight()
    Dim xRg1 As Range
    Dim xRg2 As Range
    Dim xTxt As String
    Dim xCell1 As Range
    Dim xCell2 As Range
    Dim I As Long
    Dim J As Integer
    Dim xLen As Integer
    Dim xDiffs As Boolean
    On Error Resume Next
    If ActiveWindow.RangeSelection.Count > 1 Then
      xTxt = ActiveWindow.RangeSelection.AddressLocal
    Else
      xTxt = ActiveSheet.UsedRange.AddressLocal
    End If
lOne:
    Set xRg1 = Application.InputBox("Range A:", "Similarity finder", xTxt, , , , , 8)
    If xRg1 Is Nothing Then Exit Sub
    If xRg1.Columns.Count > 1 Or xRg1.Areas.Count > 1 Then
        MsgBox "Multiple ranges or columns have been selected ", vbInformation, "Similarity finder"
        GoTo lOne
    End If
lTwo:
    Set xRg2 = Application.InputBox("Range B:", "Similarity finder", "", , , , , 8)
    If xRg2 Is Nothing Then Exit Sub
    If xRg2.Columns.Count > 1 Or xRg2.Areas.Count > 1 Then
        MsgBox "Multiple ranges or columns have been selected ", vbInformation, "Similarity finder"
        GoTo lTwo
    End If
    If xRg1.CountLarge <> xRg2.CountLarge Then
       MsgBox "Two selected ranges must have the same numbers of cells ", vbInformation, "Similarity finder"
       GoTo lTwo
    End If
    xDiffs = (MsgBox("Click Yes to highlight similarities, click No to highlight differences ", vbYesNo + vbQuestion, "Similarity finder") = vbNo)
    Application.ScreenUpdating = False
    xRg2.Font.ColorIndex = xlAutomatic
    For I = 1 To xRg1.Count
        Set xCell1 = xRg1.Cells(I)
        Set xCell2 = xRg2.Cells(I)
        If xCell1.Value2 = xCell2.Value2 Then
            If Not xDiffs Then xCell2.Font.Color = vbRed
        Else
            xLen = Len(xCell1.Value2)
            For J = 1 To xLen
                If Not xCell1.Characters(J, 1).Text = xCell2.Characters(J, 1).Text Then Exit For
            Next J
            If Not xDiffs Then
                If J <= Len(xCell2.Value2) And J > 1 Then
                    xCell2.Characters(1, J - 1).Font.Color = vbRed
                End If
            Else
                If J <= Len(xCell2.Value2) Then
                    xCell2.Characters(J, Len(xCell2.Value2) - J + 1).Font.Color = vbRed
                End If
            End If
        End If
    Next
    Application.ScreenUpdating = True
End Sub

Solution

  • One way to do this is to use the first column to create a regular expression and apply it against the second column.

    One advantage of using regex is that one of the data returned is the start and length of the match -- perfect for addressing the characters property of the range object.

    I used early binding (see the reference to be set in the code notes), but you could use late binding if you must.

    I also have the data in columns A & B, but you can alter that with the part of the code that defines the data location.

    You should NOT need to use any On Error code. Much better to write the code to handle any forseeable errors. I did NOT do any error checking, and that may need to be added.

    If speed is an issue, there are various other modifications which can be made.

    The constructed regular expression will have the general appearance of

    \b(?:nnn|nnn|nnn|nnn)\b

    which means to

    • match a word boundary
    • match any of the pipe delimited substrings
    • match another word boundary.

    For more information, see How to use Regular Expressions (Regex) in Microsoft Excel both in-cell and loops

    Option Explicit
    'Set reference to Microsoft VBScript Regular Expressions 5.5
    Sub highLight()
        Dim R As Range, C  As Range, WS As Worksheet
        Dim RE As RegExp, MC As MatchCollection, M As Match
        Dim sSplit As String
        
    'set the data range
    '   one column wide
    '   column 2 will be offset 1 to the left
    'Obviously you can change this in many ways
    'And also work in your user selected method as in your code.
    'only requirement is that the ranges be single column, and you can
    'easily check for that
    Set WS = ThisWorkbook.Worksheets("Sheet4")
    With WS
        Set R = Range(.Cells(1, 1), .Cells(.Rows.Count, 1).End(xlUp))
    End With
    
    Set RE = New RegExp
    Application.ScreenUpdating = False
    With RE
        .Global = True
        
        'loop through the first column
            For Each C In R.Rows
                'replace the semicolon with the pipe
                sSplit = Replace(Join(Split(C.Value, ";"), "|"), " ", "")
                    
                    'since data has a terminal semi-colon, need to remove it if present
                    If Right(sSplit, 1) = "|" Then sSplit = Left(sSplit, Len(sSplit) - 1)
                
                'finish construction of the regex pattern
                .Pattern = "\b(?:" & sSplit & ")\b"
                
                'check for matches and change relevant characters font color
                Set MC = .Execute(C.Offset(columnoffset:=1))
                With C.Offset(-0, 1)
                    .Font.Color = vbBlack
                    For Each M In MC
                        .Characters(M.FirstIndex + 1, M.Length).Font.Color = vbRed
                    Next M
                End With
            Next C
    End With
    End Sub
    

    enter image description here