Search code examples
uimarutaword-list

UIMA RUTA wordlist matching issue


I am trying to match some multi-word tokens using UIMA RUTA 2.6.0. And there are some phrases that are partially equal to each other, e. g. in the same file I has following entries: "includes the", "include the", "in this", "in the".

There is next piece of text in my input file: "1. "Agents or employees" includes the directors...". Obviously, there is a "includes the" match, but if other above 3 entries are present in wordlist then no match will be found. Moreover, the ordering of those entries in wordlist does not depend on matching success: it always fails.

And this issue occurs not only in single file. So, the question: how can I fix it? May be some settings of RUTA annotator?


Solution

  • Whitespaces in the wordlist can lead to missed matches. If the whitespaces are not important, set the configuration parameter 'dictRemoveWS' to true.

    DISCLAIMER: I am a developer of UIMA Ruta