I'm new to UIMA RUTA (but I have experience in plain UIMA and uimaFIT) and I'ld like to know whats the best approach (performance wise) to find money values. If applying a regex in the sentence or creating a new rule (and how it would look like)
My values would look like this:
1.000,00 1000,00 1.100.000,00 100,00 or even 1000000,00 is possible
I created a rule like
(NUM{BEFORE(PERIOD)})*(NUM{AFTER(COMMA)}) {-> MARK(Value, 1, 2);
(And even then I can't get this to work properly sometimes, and doesnt cover all my cases)
What would be easier and less resource consuming to do ?
A regular expression is probably the fastest option if you can only work on character level and need no annotations.
Using normal matching rules in UIMA Ruta, it depends on how flexible they should be. Should they also detect other locales like English or French numbers? Afterall, the runtime depends also how many numbers the document contains and so on, and on if ruta is tuned for the use case (lexer, internal indexing, ...)
Your rule won't work as expected since optional elements at the beginning of a rule are not optional (in case there is no manual anchor), and the BEFORE condition will exclude at least the first number.
This rule should do what you want, but is certainly not the fastest:
(NUM{-PARTOF(Value)} (PERIOD NUM{REGEXP("...")})* COMMA NUM{REGEXP("..")}){-> Value};
DISCLAIMER: I am developer of UIMA Ruta