Search code examples
uimaruta

UIMA Ruta negating conditions


This might be a trivial question, but I'm new to Ruta so bear with me please. My testdata consists of numbers in the following format:

0.1mm 0,11mm 1.1mm 1,1mm 1mm

I use the following rule to annotate the first four examples:

((NUM(COMMA|PERIOD)NUM) W{REGEXP("mm")}) {-> nummm};
Document{->MARK(nummm)};

Now I want to annotate "1mm", for example, too, but I'm kind of stuck right now, because I have no idea how to do this. I tried negating Conditions, like AFTER (as in "if NUM mm not after comma or period"), but it either didn't work or the syntax was wrong. Any help would be appreciated!

EDIT: I should add that I want to annotate "1mm", but not the 1mm part after a comma or period, as of right now i basically annotate everything twice.


Solution

  • There are really a lot of ways to specify this in UIMA Ruta.

    Here's the first thing that came to my mind:

    (NUM{-PARTOF(nummm)} (PM{PARTOF({COMMA,PERIOD})} NUM)? W{REGEXP("mm")}){-> nummm};
    

    This is probably not the "best" rule but should do what you want. There are three main changes:

    • I made the middle part of the rule optional so that it also matches on a single NUM.
    • I added the negated PARTOF of at the first rule element thus the matching will fail if the starting point is already covered by a nummm annotation. The - is a shortcut for the NOT condition.
    • I replaced the expensive disjunctive composed rule element with a simple one just because it is not really necessary here.

    This rule works because the actions of a rule match are already executed before the next rule match is considered.

    DISCLAIMER: I am a developer of UIMA Ruta.