Search code examples
ms-accessvbasoundex

Finding similar sounding text in VBA


My manager tells me that there is a way to evaluate names that are spelled differently but sound similar in the way they are pronounced. Ideally, we want to be able to evaluate a user-entered search name and return exact matches as well as "similar sounding" names. He called the process "Soundits" but I cannot find any info on Google.

Does this exist? Does anyone know if it is available for VBA (Access)?


Solution

  • Nice question! You're question includes a great example of the idea itself.

    There is an algorithm called the Russell Soundex algorithm, a standard technique in many applications, that evaluates names by the phonetic rather than the actual spelling. In this question, Soundits and Soundex are similar sounding names! [EDIT: Just ran the Soundex. Soundits=S532 and Soundex=S532.]

    About Soundex:

    The Soundex algorithm is predicated on characteristics of English such as:

    1. The first letter has high significance
    2. Many consonants sound similar
    3. Consonants affect pronunciation more than vowels

    One warning: Soundex was designed for names. The shorter the better. As a name grows longer, the Soundex becomes less reliable.

    Resources:

    1. Here is an example that uses VBA for Access.
    2. There is a write-up on Soundex in the VBA Developer's Handbook, 2nd Edition by Ken Getz and Mike Gilbert.
    3. There is a lot of information about Soundex and other variants such as Soundex2 (Search for 'Soundex' and 'VBA').

    Code Example:

    Below is some VBA code, found via a quick web search, that implements a variation of the Soundex algorithm.

    Option Compare Database
    Option Explicit
    
    Public Function Soundex(varText As Variant) As Variant
    On Error GoTo Err_Handler
        Dim strSource As String
        Dim strOut As String
        Dim strValue As String
        Dim strPriorValue As String
        Dim lngPos As Long
    
        If Not IsError(varText) Then
            strSource = Trim$(Nz(varText, vbNullString))
            If strSource <> vbNullString Then
                strOut = Left$(strSource, 1&)
                strPriorValue = SoundexValue(strOut)
                lngPos = 2&
    
                Do
                    strValue = SoundexValue(Mid$(strSource, lngPos, 1&))
                    If ((strValue <> strPriorValue) And (strValue <> vbNullString)) Or (strValue = "0") Then
                        strOut = strOut & strValue
                        strPriorValue = strValue
                    End If
                    lngPos = lngPos + 1&
                Loop Until Len(strOut) >= 4&
            End If
        End If
    
        If strOut <> vbNullString Then
            Soundex = strOut
        Else
            Soundex = Null
        End If
    
    Exit_Handler:
        Exit Function
    
    Err_Handler:
        MsgBox "Error " & Err.Number & ": " & Err.Description, vbExclamation, "Soundex()"
        Resume Exit_Handler
    End Function
    Private Function SoundexValue(strChar As String) As String
        Select Case strChar
        Case "B", "F", "P", "V"
            SoundexValue = "1"
        Case "C", "G", "J", "K", "Q", "S", "X", "Z"
            SoundexValue = "2"
        Case "D", "T"
            SoundexValue = "3"
        Case "L"
            SoundexValue = "4"
        Case "M", "N"
            SoundexValue = "5"
        Case "R"
            SoundexValue = "6"
        Case vbNullString
            SoundexValue = "0"
        Case Else
            'Return nothing for "A", "E", "H", "I", "O", "U", "W", "Y", non-alpha.
        End Select
    End Function
    

    Levenshtein distance

    Another method of comparing strings is to get the Levenshtein distance. Here is the example given in VBA, it is taken from LessThanDot Wiki:

    Function LevenshteinDistance(word1, word2)
    
    Dim s As Variant
    Dim t As Variant
    Dim d As Variant
    Dim m, n
    Dim i, j, k
    Dim a(2), r
    Dim cost
    
       m = Len(word1)
       n = Len(word2)
    
       ''This is the only way to use
       ''variables to dimension an array
       ReDim s(m)
       ReDim t(n)
       ReDim d(m, n)
    
       For i = 1 To m
           s(i) = Mid(word1, i, 1)
       Next
    
       For i = 1 To n
           t(i) = Mid(word2, i, 1)
       Next
    
       For i = 0 To m
           d(i, 0) = i
       Next
    
       For j = 0 To n
           d(0, j) = j
       Next
    
    
       For i = 1 To m
           For j = 1 To n
    
               If s(i) = t(j) Then
                   cost = 0
               Else
                   cost = 1
               End If
    
               a(0) = d(i - 1, j) + 1             '' deletion
               a(1) = d(i, j - 1) + 1             '' insertion
               a(2) = d(i - 1, j - 1) + cost      '' substitution
    
               r = a(0)
    
               For k = 1 To UBound(a)
                   If a(k) < r Then r = a(k)
               Next
    
               d(i, j) = r
    
           Next
    
       Next
    
       LevenshteinDistance = d(m, n)
    
    End Function