This might be a trivial question, but I'm new to Ruta so bear with me please. My testdata consists of numbers in the following format:
0.1mm 0,11mm 1.1mm 1,1mm 1mm
I use the following rule to annotate the first four examples:
((NUM(COMMA|PERIOD)NUM) W{REGEXP("mm")}) {-> nummm};
Document{->MARK(nummm)};
Now I want to annotate "1mm", for example, too, but I'm kind of stuck right now, because I have no idea how to do this. I tried negating Conditions, like AFTER (as in "if NUM mm not after comma or period"), but it either didn't work or the syntax was wrong. Any help would be appreciated!
EDIT: I should add that I want to annotate "1mm", but not the 1mm part after a comma or period, as of right now i basically annotate everything twice.
There are really a lot of ways to specify this in UIMA Ruta.
Here's the first thing that came to my mind:
(NUM{-PARTOF(nummm)} (PM{PARTOF({COMMA,PERIOD})} NUM)? W{REGEXP("mm")}){-> nummm};
This is probably not the "best" rule but should do what you want. There are three main changes:
NUM
.nummm
annotation. The -
is a shortcut for the NOT
condition.This rule works because the actions of a rule match are already executed before the next rule match is considered.
DISCLAIMER: I am a developer of UIMA Ruta.