Search code examples
sqlnlpstemminglemmatization

SQL word root matching


I'm wondering whether major SQL engines out there (MS SQL, Oracle, MySQL) have the ability to understand that 2 words are related because they share the same root.

We know it's easy to match "networking" when searching for "network" because the latter is a substring of the former.

But do SQL engines have functions that can match "network" when searching for "networking"?

Thanks a lot.


Solution

  • This functionality is called a stemmer: an algorithm that can deduce a stem from any form of the word.

    This can be quite complex: for instance, Russian words шёл and иду are different forms of the same verb, though they have not a single common letter (ironically, this is also true for English: went and go).

    Word breaking can also be quite a complex task for some languages that use no spaces between words.

    SQL Server allows using pluggable stemmers and word breakers for its fulltext search engine:

    http://msdn.microsoft.com/en-us/library/ms142509.aspx