Search code examples
sqlsql-serversortingsimilarity

How to find strings which are similar to given string in SQL server?


I have a SQL server table which contains several string columns. I need to write an application which gets a string and search for similar strings in SQL server table.

For example, if I give the "مختار" or "مختر" as input string, I should get these from SQL table:

1 - مختاری
2 - شهاب مختاری
3 - شهاب الدین مختاری

I've searched the net for a solution but I have found nothing useful. I've read this question , but this will not help me because:

  1. I am using MS SQL Server not MySQL
  2. my table contents are in Persian, so I can't use Levenshtein distance and similar methods
  3. I prefer an SQL Server only solution, not an indexing or daemon based solution.

The best solution would be a solution which help us sort result by similarity, but, its optional.

Do you have any suggestion for that?

Thanks


Solution

  • Hmm.. considering that you read the other post you probably know about the like operator already... maybe your problem is "getting the string and searching for something similar"?

    --This part searches for a string you want
    
    declare @MyString varchar(max)
    
    set @MyString = (Select column from table
    where **LOGIC TO FIND THE STRING GOES HERE**)
    
    
    --This part searches for that string
    
    select searchColumn, ABS(Len(searchColumn) - Len(@MyString)) as Similarity
    from table where data LIKE '%' + @MyString + '%'
    Order by Similarity, searchColumn
    

    The similarity part is something like the thing you posted. If the strings are "more similar" meaning that they have a similar length, they will be higher on the results query. The absolute part can be avoided obviously but I did it just in case.

    Hope that helps =-)