Search code examples
vb.netfunctionhashset

How to Find best match using hashset(Of String) ignore case


I found the following function to compare text to a list(Of String). Now I want to know if it's possible to make it ignore case in the comparison & if so how would I change this code?

    Public Shared Function FindBestMatch(ByVal stringToCompare As String, ByVal strs As IEnumerable(Of String)) As String
        Dim strCompareHash As HashSet(Of String) = stringToCompare.Split(Microsoft.VisualBasic.ChrW(32)).ToHashSet
        Dim maxIntersectCount As Integer = 0
        Dim bestMatch As String = String.Empty
        For Each str As String In strs
            Dim strHash As HashSet(Of String) = str.Split(Microsoft.VisualBasic.ChrW(32)).ToHashSet
            Dim intersectCount As Integer = strCompareHash.Intersect(strHash).Count
            If (intersectCount > maxIntersectCount) Then
                maxIntersectCount = intersectCount
                bestMatch = str
            End If

        Next
        Return bestMatch
    End Function

Any guidance would be much appreciated

All the suggestions I've seen I haven't been able to implement.


Solution

  • The HashSet(Of T) class has a constructor that let you specify the comparer. The same applies to the LINQ method ToHashSet.

    Dim comparer As IEqualityComparer(Of String) = StringComparer.OrdinalIgnoreCase
    Dim strCompareHash As HashSet(Of String) = stringToCompare.
        Split(Microsoft.VisualBasic.ChrW(32)).
        ToHashSet(comparer)
    

    The whole method could be rewritten with LINQ to be more compact:

    Public Shared Function FindBestMatch(ByVal stringToCompare As String, ByVal strs As IEnumerable(Of String)) As String
        Dim comparer As IEqualityComparer(Of String) = StringComparer.OrdinalIgnoreCase
        Dim strCompareHash As HashSet(Of String) = stringToCompare.Split(ChrW(32)).ToHashSet(comparer)
        return strs.
            Select(Function(s) (Str:=s, Tokens:=s.Split(ChrW(32)))).
            OrderByDescending(Function(x) strCompareHash.Intersect(x.Tokens, comparer).Count()). 
            First().Str
    End Function
    

    This assumes that strs is never null or empty, you might want to ensure that and return Nothing early. It would also be good to add an (optional) parameter to be able to pass the IEqualityComparer(Of String).