Search code examples
uimaruta

Uima ruta -Abbrevations


Can I segment the letters of a word using Uima Ruta?

Ex.

1.(WHO)
2.(APIAs)

Script:

DECLARE NEW;
BLOCK (foreach)CAP{}
{
W{REGEXP(".")->MARK(NEW)};

}

Solution

  • Yes, this is achieved with simple regex rules in UIMA Ruta:

    DECLARE Char;
    CAP->{"."->Char;};
    

    You cannot use normal rules for this because you need to match on something smaller than RutaBasic. The only option is to use regexp rules which operate directly on the text instead of on annotations. You should of course be very careful since this can lead to really many annotations.

    Some explanation for the somewhat compact rule: CAP->{"."->Char;};

    CAP // the only rule element of the rule: match on each CAP annotation
    ->{// indicates that inlined rules follow that are applied in the context of the matched annotation.
    "." // a regular expression matching on each character
    -> Char // the "action" of the regex rule: create an annotation of the type Char for each match of the regex 
    ;}; // end of regex rule, end of inlined rules, end of actual rule
    

    Summarizing, the rule iterates over all CAP annotations, applies a regular expression on each iterated covered text and creates annotations for the matches.

    You can of course also use a BLOCK instead of an inlined rule.

    DISCLAIMER: I am a developer of UIMA Ruta