Search code examples
sqlsql-servert-sqlwildcardsql-like

SQL like '%term%' except without letters


I'm searching against a table of news articles. The 2 relevant columns are ArticleTitle and ArticleText. When I want to search an article for a particular term, i started out with

column LIKE '%term%'.

However that gave me a lot of articles with the term inside anchor links, for example <a href="example.com/*term*> which would potentially return an irrelevant article.

So then I switched to

column LIKE '% term %'.

The problem with this query is it didn't find articles who's title or text began/ended with the term. Also it didn't match against things like term- or term's, which I do want.

It seems like the query i want should be able to do something like this

'%[^a-z]term[^a-z]%

This should exclude terms within anchor links, but everything else. I think this query still excludes strings that begin/end with the term. Is there a better solution? Does SQL-Server's FULL TEXT INDEXING solve this problem?

Additionally, would it be a good idea to store ArticleTitle and ArticleText as HTML-free columns? Then i could use '%term%' without getting anchor links. These would be 2 extra columns though, because eventually i will need the original HTML for formatting purposes.

Thanks.


Solution

  • SQL Server's LIKE allows you to define Regex-like patterns like you described.

    A better option is to use fulltext search:

    WHERE CONTAINS(ArticleTitle, 'term')
    

    exploits the index properly (the LIKE '%term%' query is slow), and provides other benefit in the search algorithm.

    Additionally, you might benefit from storing a plaintext version of the article alongside the HTML version, and run your search queries on it.