Search code examples
uimaruta

UIMA ruta annotate a sequence of fixed length containing words from specific wordlist


I have a WORDTABLE containing numbers expressed as strings (zero, one, two, ..., n) plus the respective digits as features. I am trying to annotate a sequence of a fixed length of stringified numbers.

E.g.:

one two three four -> should be annotated

one two three four five six -> should not be annotated

So far I have done

WORDTABLE numbers = "numbers.csv";

DECLARE Annotation number(STRING int_string, STRING digit);
DECLARE Annotation numberSequence;

Document{-> MARKTABLE(number, 1, numbers, "digit" = 2)};
(number number) {-> MARK(numberSequence)};

This matches a sequence containing n stringified number, what I want is establishing the length of the sequence, something like:

number[4,4] {-> MARK(numberSequence)};

where the minimum and maximum tokens in the sentence containing the stringified numbers should be equal, for example, to 4. Is it possible to do this?


Solution

  • Here's an exemplary rule for annotating text positions if there are exactly four annotations of the type number:

    ANY{-PARTOF(number)} @number[4,4] {-> MARK(numberSequence)} ANY{-PARTOF(number)};
    

    DISCLAIMER: I am a developer of UIMA Ruta