Search code examples
vbaexcelexcel-2016excel-2002

Deleting duplicate text in a cell in excel


I was wondering how to remove duplicate names/text's in a cell. For example

Jean Donea Jean Doneasee 
R.L. Foye R.L. Foyesee 
J.E. Zimmer J.E. Zimmersee 
R.P. Reed R.P. Reedsee  D.E. Munson D.E. Munsonsee 

While googling, I stumbled upon a macro/code, it's like:

Function RemoveDupes1(pWorkRng As Range) As String
'Updateby20140924
Dim xValue As String
Dim xChar As String
Dim xOutValue As String
Set xDic = CreateObject("Scripting.Dictionary")
xValue = pWorkRng.Value
For i = 1 To VBA.Len(xValue)
    xChar = VBA.Mid(xValue, i, 1)
   If xDic.exists(xChar) Then
   Else
      xDic(xChar) = ""
      xOutValue = xOutValue & xChar
   End If
Next
RemoveDupes1 = xOutValue
End Function

The macro is working, but it is comparing every letter, and if it finds any repeated letters, it's removing that.

When I use the code over those names, the result is somewhat like this:

Jean Dos
R.L Foyes
J.E Zimers
R.P edsDEMuno

By looking at the result I can make out it is not what I want, yet I got no clue how to correct the code.

The desired output should look like:

 Jean Donea
 R.L. Foye 
 J.E. Zimmer
 R.P. Reed 

Any suggestions?

Thanks in Advance.


Solution

  • Input

    With the input on the image:

    ![Input names

    Result

    The Debug.Print output

    Output

    Regex

    A regex can be used dynamically iterating on the cell, to work as a Find tool. So it will extract only the shortest match. \w*( OUTPUT_OF_EXTRACTELEMENT )\w*, e.g.: \w*(Jean)\w*

    The Regex's reference must be enabled.

    Code

    Function EXTRACTELEMENT(Txt As String, n, Separator As String) As String
        On Error GoTo ErrHandler:
        EXTRACTELEMENT = Split(Application.Trim(Mid(Txt, 1)), Separator)(n - 1)
        Exit Function
    ErrHandler:
        ' error handling code
        EXTRACTELEMENT = 0
        On Error GoTo 0
    End Function
    
    Sub test()
    
    Dim str As String
    Dim objMatches As Object
    Set objRegExp = CreateObject("VBScript.RegExp") 'New regexp
    lastrow = ActiveSheet.Cells(ActiveSheet.Rows.Count, "A").End(xlUp).Row
    For Row = 1 To lastrow
        str = Range("A" & Row)
        F_str = ""
        N_Elements = UBound(Split(str, " "))
        If N_Elements > 0 Then
            For k = 1 To N_Elements + 1
                strPattern = "\w*(" & EXTRACTELEMENT(CStr(str), k, " ") & ")\w*"
                With objRegExp
                    .Pattern = strPattern
                    .Global = True
                End With
                If objRegExp.test(strPattern) Then
                    Set objMatches = objRegExp.Execute(str)
                    If objMatches.Count > 1 Then
                        If objRegExp.test(F_str) = False Then
                            F_str = F_str & " " & objMatches(0).Submatches(0)
                        End If
                    ElseIf k <= 2 And objMatches.Count = 1 Then
                        F_str = F_str & " " & objMatches(0).Submatches(0)
                    End If
                End If
            Next k
        Else
            F_str = str
        End If
        Debug.Print Trim(F_str)
    Next Row
    
    End Sub
    

    Note that you can Replace the Debug.Print to write on the target cell, if it is column B to Cells(Row,2)=Trim(F_str)

    Explanation

    Function

    You can use this UDF, that uses the Split Function to obtain the element separated by spaces (" "). So it can get every element to compare on the cell.

    Loops

    It will loop from 1 to the number of elements k in each cell and from row 1 to lastrow.

    Regex

    The Regex is used to find the matches on the cell and Join a new string with the shortest element of each match.