Search code examples
sqlstored-proceduressql-server-2012fuzzy-searchsoundex

Increase fault tolerance for SQL Soundex for fuzzy string matching


Is there a way to increase the fault tolerance of the SQL SOUNDEX function when used for fuzzy string matching? I am using this to search for supplier names. What I have is:

WHERE
    SOUNDEX(@SearchTerm) = SOUNDEX(s.Name)

This works to some degree, but I want to increase the fault tolerance a bit. For example:

SOUNDEX('test') = T230
SOUNDEX('tet') = T300
SOUNDEX('tets') = T320
SOUNDEX('tes') = T200

Ideally, I would like to have all these examples match. Also, a slight issue is when a supplier name consists of 2 words and the user searches by the second word, this will obviously work since the marching is dependent on the first letter of the word:

SOUNDEX('test supplier') = T230
SOUNDEX('supplier') = S146

Originally, I was making use of a fuzzy string comparison library in the web application that used various approximate string comparison algorithms, but it turns out when 20 people search concurrently for 5000+ suppliers, it's too much for the web server to handle. I'm now attempting to do this in a stored procedure so that the database returns only the searched results and not the complete list of suppliers to be searched.


Solution

  • You can implement an UDF and deploy it in the server using SQL CLR integration. (More spceifically: CLR Scalar-Valued Functions)

    I can assure this works because I'm doing exactly that same thing and looking over more than 350.000 rows (without any other filter) and it works really fast. It's a variation of Damerau-Leventshein algorithm, thus a time consuming algorithm.

    The difference between returning all the rows to the app, and filtering it in the app side, and doing the filtering directly inside SQL Server is huge.