Search code examples
uimaruta

Setting feature value to the count of containing annotation in UIMA Ruta


I've got a RUTA script where all the sentences have been annotated with a Sentence annotation and various words and phrases have been annotated with their own specific annotations. That all works as expected.

Each one of those annotations has a feature for the index of the sentence that contains it. So in a contrived example and given the text

Jack and Jill went up the hill. Jack fell down.

I have a "down" annotation that I want set the sentence index to 2, indicating that it is in the second sentence. I'm thinking something like the following although I know that's not correct.

Sentence{CONTAINS(Down) -> Down.sentence_index = index

where the index is the index of the the sentence. Is this possible with RUTA? If so, what's the appropriate script. I can do this in a separate analysis engine and have done so in the past, but I'm hoping to replace some of that with ruta scripts.

thanks,

Nick


Solution

  • There are several ways to express this in UIMA Ruta. My first guess would be something like:

    // just to have an executable example
    DECLARE Sentence;
    DECLARE Annotation Down (INT sentence_index);
    ((# PERIOD){-> Sentence})+;
    "down" -> Down;
    
    // the acutal rule with a helper variable
    INT index;
    Sentence{CONTAINS(Down), CURRENTCOUNT(Sentence, index)} -> 
       {Down{-> Down.sentence_index = index};};
    

    The rule matches on all sentences that contain a Down annotation. Additionally, CURRENTCOUNT counts the Sentence annotations upto the matched position and stores the values in the variable index. Then, an inlined rule (indicated by the first "->") matches on all Down annotations within the matched sentence and assigns the value of the variable to the feature of the matched Down annotation. Depending if you want to start with 0 or 1, you need to increment the assigned value:

    ... Down.sentence_index = (index+1)};};
    

    The condition CURRENTCOUNT can also accept an min and max value in order to act like a real condition. It is realy old, so I don't know how it scales for large documents.

    Here's another example, but this time without the CURRENTCOUNT condition and for storing the index in the Sentence annotation:

    DECLARE Annotation Sentence (INT index);
    DECLARE Annotation Down (INT sentence_index);
    INT index;
    
    (# PERIOD){-> Sentence, ASSIGN(index, (index + 1)), Sentence.index = index};
    PERIOD (# PERIOD){-> Sentence, ASSIGN(index, (index + 1)), Sentence.index = index};
    "down" -> Down;
    
    Sentence{CONTAINS(Down) -> ASSIGN(index, Sentence.index)} 
      ->  {Down{-> Down.sentence_index = index};};
    

    Mind that the rule for creating Sentence annotations in the first example cannot be used since it uses only one rule match and its actions are applied on the matched fragments. The rule in the second example results in many rule matches and thus applies the actions before the next rule match is processed. The copying between feautre values of different matching scopes is not really nice, but that will maybe be improved sometime.

    If you have already Sentence annotations, you can assign the index with something like:

    Sentence{-> ASSIGN(index, (index + 1)), Sentence.index = index};
    

    Examples have been tested with UIMA Ruta 2.2.1-SNAPSHOT.

    (I am a developer of UIMA Ruta)