Search code examples
regexregex-greedyuimauimanageddocumentruta

Detect below string using regex in uima RUTA


I'm trying to annotate below string as a type using regex in UIMA RUTA.

SAMPLE:

  • *******$10.00*

Other Variant:

  • *******$10.00***
  • *******$90.00*
  • *******$99**

    Regex: *+\$\d+.\d+*+

UIMA REGEX:

SPECIAL{REGEXP("\\*+\\$\\d+.\\d+\\*+") -> MARK(AmC,1)};

I'm not able to detect as * represent greedy regex but as I'm using escape character() still something is missing.Any workaround?

PS: It's working for other regex engine but not working in UIMA RUTA


Solution

  • The REGEXP condition applies Java Pattern matches() on the covered text of the matched annotation of the rule element. In your example, this is SPECIAL which is a single special character. Thus, the regex tried to match on a single "*" and then anew the next character (excluding the digit and the period).

    The REGEXP condition is not really suitable here. You should rather use a simple regex rule like:

    "\\*+\\$\\d+.\\d+\\*+" -> AmC;
    

    DISCLAIMER: I am a developer of UIMA Ruta