Search code examples
searchstemming

Singular/plural searches and stemming


I'm discovering a simple solution for singular-plural keywords searches. I heard about stemming but I don't want to use all its features, only plural/singular transformation. The language is Dutch. Have looked at http://www.snowball.tartarus.org before. Does anyone know the simple solution for singular|plural relevant searches? Thanks in advance.


Solution

  • Use a dictionary, a list of stopwords (those you don't want to singularize) plus the rules for the language. If you don't know Dutch then I cannot help you, but show you how it'd be done in Spanish, for instance:

    • Plurals end with s, if it doesn't then it's done
      • If it ends with s,
        • check if it's a verb or conjugation ending with s if it is one, then it's done (verbs could be added to the stopwords list)
        • if it's not a verb, remove s
        • if the word exists in the dictionary, done
        • if it doesn't remove the previous letter, and check it in the dictionary.
        • if it's still not there it's an exception you'll need to check manually to code in the exceptions (I cannot right now think of any, but they always exist :)

    Of course this will not translate directly to Dutch.

    In general stemmers are already done and provide most of what you need, why don't you want them?