Search code examples
ruta

UIMA Ruta : script for the combination of chars and numbers


I've just started with Ruta and I would like to write a rule that finds any combination of chars, numbers and dot (.) .

(JAVA Regex for it - ([a-z0-9.]+) )

for e.g. -

abcd.03ef0.3abc

03a.bcd.03eeff903a.bc


Solution

  • Something like the following:

    (SW | NUM | PERIOD)+{-> MyType};
    

    or if uppercase chars should also be included:

    (W | NUM | PERIOD)+{-> MyType};
    

    change the filtering setting before, if no spaces may occur in between:

    Document{-> RETAINTYPE(SPACE,BREAK,MARKUP)};
    

    in order to avoid overlapping matches, you can either use MARKONCE instead of the implicit action, an additional (negated) condition -PARTOF(MyType), or change the matching strategy with GREEDYANCHORING.