Search code examples
annotationsuimaruta

UIMA RUTA annotation at the beginning of sequence


I have sequence of annotations that are instances of the same type (e.g. sequence of CW annotations). I need to remove the first of them (more formally: remove annotation that has no annotations of the same type before in document). Less formally: to remove an annotation at the beginning of document. Example document: "Software StageTools" So, I tried many variants:

"Software"{-AFTER(CW) -> UNMARK(CW)} CW+;             //does not work
"Software"{BEFORE(CW) -> UNMARK(CW)} CW+;             //does not work
"Software"{-STARTSWITH(Document) -> UNMARK(CW)} CW+;  //does not work
CW{0, 0} "Software"{-> UNMARK(CW)} CW+;               //getting parsing error

...and some other ones. Obviously, no one works (may be, I can refer to begin feature of annotation, but this will not solve formal issue).

At last, the question is - how can I say RUTA to remove annotation that has no annotations of the same type before in document?


Solution

  • There are many ways to do this. Here are two examples:

    # cw:CW.ct=="Software"{-> UNMARK(cw)} CW;
    

    Remove the first CW "Software" in the document if there is another CW following.

    ANY{-PARTOF(CW)} cw:@CW.ct=="Software"{-> UNMARK(cw)} CW;
    

    Remove any CW "Software" if there is a CW following and there is no CW preceding. If the document can start with the pattern, you need a second rule.

    Your second rule actually works for me. The last rule has no valid syntax. The min/max quantifier requires different brackets like [0,0]. However, this would not have the effect you want.

    DISCLAIMER: I am a developer of UIMA Ruta