Search code examples
excelvbatextduplicateshighlight

Find and highlight duplicate cells AND text strings within cells in a selected Excel column


How can I highlight cells with duplicate text strings in an Excel column?

Although it works perfectly well for duplicated cells, there doesn't seem to be any way to do this in the Conditional Formatting feature.

I frequently have an issue with customer bill of materials, where they contain duplicate references. In the provided example, reference R60 is listed twice in item 103, and reference R32 is in two different rows, in Item 105 and 106. So only looking for duplicate cells won't work.

Example (pasted from Excel, for some reason it won't let me insert a picture):

Item Qty Reference
100 1 U12
101 1 U3
102 5 R38,R39,R40,R41,R45
103 1 R60,R60
104 1 R13
105 2 R17,R32
106 2 R32,R43
107 8 R8-9,R26,R30,R36,R44,R58,R61
108 2 R19,R24
109 2 R53,R59
110 3 R16,R46-47

Additionally, different customers will separate references in different ways. Some use a comma, some use a space, and some use both a comma and a space. Occasionally they will use a combination of them. There can be hundreds of references in a given cell, so using Text to Columns and then using Conditional Formatting (which I saw suggested as a possible solution in a similar post) isn't going to work for me. Ideally, if there is a solution to this, it will take all this into account. Being able to select a delimiter might work too.

Based on several hours of web searching and experimentation, COUNTIF seems to be the key, but I am not at all familiar with the function or how to manipulate it.

Below is the VBA code I've been working on. The first part is just using the Conditional Formatting feature. The second part is a mash-up of two different codes I found that I thought would work, but I'm probably not using them right. I am very much a neophyte at VBA coding. My apologies in advance for that.

 Sub DuplicateRed() 
 
 '
 ' DuplicateRed Macro
 ' First, turn duplicate cells red and second, duplicate text strings Within cells Red 
 ' - November 15 2021
 
 ' First, turn duplicate cells Red
 
     Dim r As Range ' Runs the macro on the selected column / cells
 
     Selection.FormatConditions.AddUniqueValues
     Selection.FormatConditions(Selection.FormatConditions.Count).SetFirstPriority
     Selection.FormatConditions(1).DupeUnique = xlDuplicate
     With Selection.FormatConditions(1).Font
         .Color = -16383844
         .TintAndShade = 0
     End With
     With Selection.FormatConditions(1).Interior
         .PatternColorIndex = xlAutomatic
         .Color = 13551615
         .TintAndShade = 0
     End With
     Selection.FormatConditions(1).StopIfTrue = False
 
 'Second, turn any duplicate text stings within cells Red
 
   Range(Addr) = Evaluate("IF(COUNTIF(" & Addr & "," & Addr &
 ")>1,""=""&" & Addr & "," & Addr & ")")   On Error Resume Next  
 Range(Addr).SpecialCells(xlFormulas).Interior.ColorIndex = 6  
 Range(Addr).Replace "=", "", xlPart
 
 ' Locate duplicate values in selected range
 
         If Application.Evaluate("COUNTIF(" & myDataRng.Address & "," & cell.Address & ")") > 1 Then
             cell.Offset(0, 0).Font.Color = vbRed        ' CHANGE COLOR TO RED.
         End If  Next cell
      
     Set myDataRng = Nothing ErrHandler:
     Application.EnableEvents = True
     Application.ScreenUpdating = True
     
      End Sub' ' DuplicateRed Macro ' First, turn duplicate

Ultimately, what I'd like to be able to do is select the references column (or a portion of it), and then run a VBA script on that selection to highlight the cells with duplicated references in them. I do not want to delete any of the duplicates, since I need to be able to tell customers that there is an error in their bill of materials.

Edit: New VBA code with help from JNevill

    Sub highlight_duplicates()

' Turn duplicate cells and duplicate text strings within cells Red
' November 15 2021
' Credit to JNevill

'First, turn any duplicated text stings within cells Red

    'Declare variables used in this script
    Dim referenceRange As Range
    Dim referenceCell As Range
    Dim referenceArray As Variant
    Dim referenceVal As String
    Dim referenceItem As Variant
    
    'Grab the selection into a variable
    Set referenceRange = Selection
    
    'iterate through each cell in the range
    For Each referenceCell In referenceRange
    
        'Because we can have either a space or a comma as a delimiter,
        '  lets make them all comma so it's easier to deal with.
        '  Note this doesn't change the value in the cell, just the
        '  variable here in VBA.
        referenceVal = Replace(referenceCell.Value, ", ", ",")
        referenceVal = Replace(referenceCell.Value, " ", ",")
        
        'Break this thing into an array so it's easier to work with each
        '  value. The big advantage here is that we can iterate through
        '  an array, where iterating through a string is a nightmare.
        referenceArray = Split(referenceVal, ",")
        
        'We will use a dictionary to determine if there are duplicates in
        '  in this array. By definition an item in a dictionary can not be
        '  a duplicate so we just dump all the values of the array into
        '  the dictionary and then count elements of both the dictionary
        '  and the array. If they are they same, then the array has no
        '  duplicates.
        With CreateObject("Scripting.Dictionary")
        
            'Dump array into dictionary
            For Each referenceItem In referenceArray
                If Not .Exists(referenceItem) Then .Add referenceItem, 1
            Next referenceItem
            
            'Toggle looks of cell based on uniqueness
            If .Count < UBound(referenceArray) + 1 Then
                With referenceCell.Font
                    .Color = -16383844
                    .Bold = True
                End With
            Else
                With referenceCell.Font
                    .Color = 1
                    .Bold = False
                End With
            End If
            
        End With
    
    Next referenceCell
    
' Second, turn duplicate cells Red

    Dim r As Range
' Runs the macro on the selected column

    Selection.FormatConditions.AddUniqueValues
    Selection.FormatConditions(Selection.FormatConditions.Count).SetFirstPriority
    Selection.FormatConditions(1).DupeUnique = xlDuplicate
    With Selection.FormatConditions(1).Font
        .Color = -16383844
        .TintAndShade = 0
    End With
    With Selection.FormatConditions(1).Interior
        .PatternColorIndex = xlAutomatic
        .Color = 13551615
        .TintAndShade = 0
    End With
    Selection.FormatConditions(1).StopIfTrue = False


End Sub

Solution

  • This is definitely one of those "This should be easy" things that can end up getting ugly quick. Thankfully there's a few tricks we can employ in VBA to make this easier to deal with.

    1. Splitting the comma/space delimited string into an Array so that we can deal with each word/reference in the array independently.
    2. Using(Abusing) a Scripting.Dictionary because a dictionary only allows unique keys. If you compare the list of references stored in an array to a list of references stored in a dictionary and the counts are different, then the array, which holds the original list, MUST have duplicates

    Here's a working example:

    Sub highlight_duplicates()
    
        'Declare variables used in this script
        Dim referenceRange As Range
        Dim referenceCell As Range
        Dim referenceArray As Variant
        Dim referenceVal As String
        Dim referenceItem As Variant
        
        'Grab the selection into a variable
        Set referenceRange = Selection
        
        'iterate through each cell in the range
        For Each referenceCell In referenceRange
        
            'Because we can have either a space or a comma as a delimiter,
            '  lets make them all comma so it's easier to deal with.
            '  Note this doesn't change the value in the cell, just the
            '  variable here in VBA.
            referenceVal = Replace(referenceCell.Value, " ", ",")
            
            'Break this thing into an array so it's easier to work with each
            '  value. The big advantage here is that we can iterate through
            '  an array, where iterating through a string is a nightmare.
            referenceArray = Split(referenceVal, ",")
            
            'We will use a dictionary to determine if there are duplicates in
            '  in this array. By definition an item in a dictionary can not be
            '  a duplicate so we just dump all the values of the array into
            '  the dictionary and then count elements of both the dictionary
            '  and the array. If they are they same, then the array has no
            '  duplicates.
            With CreateObject("Scripting.Dictionary")
            
                'Dump array into dictionary
                For Each referenceItem In referenceArray
                    If Not .Exists(referenceItem) Then .Add referenceItem, 1
                Next referenceItem
                
                'Toggle looks of cell based on uniqueness
                If .Count < UBound(referenceArray) + 1 Then
                    With referenceCell.Font
                        .Color = -16383844
                        .Bold = True
                    End With
                Else
                    With referenceCell.Font
                        .Color = 1
                        .Bold = False
                    End With
                End If
                
            End With
        
        Next referenceCell
        
    End Sub