Search code examples
stringvbaexcelparamarray

VBA Function to exclude parts of a string


My sub compares two lists of strings and returns the closest matches. I've found that the sub gets tripped up over some common words such as "the" and "facility". I would like to write a function that would be supplied an array of words to exclude and check each string for these words and exclude them if found.

Here is a sample input:

|aNames        |  bNames        | words to exclude
|thehillcrest  |oceanview health| the
|oceanview, the|hillCrest       | health

Intended Output:

|aResults     |bResuts
|hillcrest    |hillcrest
|oceanview    |oceanview

So far I have:

Dim ub as Integer
Dim excludeWords() As String

'First grab the words to be excluded
If sheet.Cells(2, 7).Value <> "" Then
  For y = 2 To sheet.Range("G:G").End(xlDown).Row
    ub = UBound(excludeWords) + 1             'I'm getting a subscript out of range error here..?
    ReDim Preserve excludeWords(0 To ub)
    excludeWords(ub) = sheet.Cells(y, 7).Value
  Next y
End If

Then my comparison function, using a double loop, will compare each string in column A with column B. Before the comparison, the value in column a and b will go through our function which will check for these words to exclude. It's possible that there will be no words to exclude, so the parameter should be optional:

Public Function normalizeString(s As String, ParamArray a() As Variant)
  if a(0) then           'How can I check?
    for i = 0 to UBound(a)
      s = Replace(s, a(i))
    next i
  end if
  normalizeString = Trim(LCase(s))
End Function

There's probably a few parts in this code that won't work. Might you be able to point me in the right direction?

Thank you!


Solution

  • To store the list in the array, you can do this

    Sub Sample()
        Dim excludeWords As Variant
        Dim lRow As Long
    
        With Sheet1 '<~~ Change this to the relevant sheet
            '~~> Get last row in Col G
            lRow = .Range("G" & .Rows.Count).End(xlUp).Row
    
            excludeWords = .Range("G2:G" & lRow).Value
    
            'Debug.Print UBound(excludeWords)
    
            'For i = LBound(excludeWords) To UBound(excludeWords)
                'Debug.Print excludeWords(i, 1)
            'Next i
        End With
    End Sub
    

    And then pass the array to your function. The above array is a 2D array and hence needs to be handled accordingly (see commented section in the code above)

    Also like I mentioned in the comments above

    How does oceanview, the become Oceanview? You can replace the but that would give you oceanview, (notice the comma) and not Oceanview.

    You may have to pass those special characters to Col G in the sheet or you can handle them in your function using a loop. For that you will have to use the ASCII characters. Please see this

    Followup from comments

    Here is something that I wrote quickly so it is not extensively tested. Is this what you are looking for?

    Sub Sample()
        Dim excludeWords As Variant
        Dim lRow As Long
    
        With Sheet1
            lRow = .Range("G" & .Rows.Count).End(xlUp).Row
    
            excludeWords = .Range("G2:G" & lRow).Value
    
            '~~> My column G has the word "habilitation" and "this"
            Debug.Print normalizeString("This is rehabilitation", excludeWords)
    
            '~~> Output is "is rehabilitation"
        End With
    End Sub
    
    Public Function normalizeString(s As String, a As Variant) As String
        Dim i As Long, j As Long
        Dim tmpAr As Variant
    
        If InStr(1, s, " ") Then
            tmpAr = Split(s, " ")
    
            For i = LBound(a) To UBound(a)
                For j = LBound(tmpAr) To UBound(tmpAr)
                    If LCase(Trim(tmpAr(j))) = LCase(Trim(a(i, 1))) Then tmpAr(j) = ""
                Next j
            Next i
            s = Join(tmpAr, " ")
        Else
            For i = LBound(a) To UBound(a)
                If LCase(Trim(s)) = LCase(Trim(a(i, 1))) Then
                    s = ""
                    Exit For
                End If
            Next i
        End If
    
        normalizeString = Trim(LCase(s))
    End Function