We have three tables to hold our products and keywords:
Product {int ID, string name, ...}
ProductKeyword {int productID, int keywordID}
Keyword {int ID, string keyword}
This SQL code returns the most relevant products to the least relevant products having keywords like the users search criteria. searchWordTable is a table of search words. @keywordCount
is a count of the search words. This returns all the products having one or more keywords, ordered by the quantity of keywords found for each product.
select productid, productname, count(*) * 1 / @keywordCount as percentRelevant
from (select keyword, productid, productname
from product
join productkeyword on ...
join keyword on ...
join searchWordTable on searchwordtable.keyword like
'%' + keyword.keyword + '%') K -- like join aweful
group by productid, productname
order by percentRelevant desc -- Most relevant first
The problem is this is a sequential search comparing every keyword we have. It's not bad, but searches can take a minute with a million records.
How could I rewrite the query to not use like
, hopefully use an indexed search, and get similar results? They use like
to get partial matches, e.g., 'bone' in 'boneless'.
Links to better SQL algorithms would surely be appreciated.
Like is killing you, especially with that leading wildcard which completely removes any indexes the columns may have.
You should look into using SQL Server's full-text indexing. contains
will probably be much faster and is far more powerful for partial matches of the nature you appear to be doing.