Search code examples
javaannotationsuimaruta

UIMA Ruta: Creating new annotations by combining existing annotation's features in plain Java


I'm trying to convert the following logic into a UIMA Ruta Rule:

Sentence {->NewAnnotation} IF Sentence.part1 contains Constituent.label="VB" AND Sentence.part2 contains Constituent.label="VBZ"

In other words, I need to create a new annotation out of the entire Sentence and whose feature part1(and part2) contains combinations/a sequence of specific posTags (Constituent.label).

At first, an intuitive answer for me was to use the CONTAINS condition along with a STRINGLIST(and config parameters) in the following manner:

STRINGLIST posList; //assuming it is declared
Sentence{-> NewAnnotation} <-{Sentence.part1{CONTAINS(posList, Constituent.label)};};

But it doesn't produce any annotations(yet it doesn't fail).

Then I considered the GETFEATURE action by storing the Sentence feature(Sentence.part1) in a string variable and using it separately(in the main rule). However, since GETFEATURE saves the feature in a STRING format so I cannot use it to produce annotations (since I need ANNOTATION type). Same happens with MATCHEDTEXT action.

I understand the rule a want to build is quite complex but I believe Ruta is the most suitable option for such tasks. So, can you please suggest me any ideas of how to deal with my problem?


Solution

  • As @PeterKluegl already stated, the solution to the original question would be:

    Sentence{-> NewAnnotation} <-{Sentence.part1<-{Constituent.label=="VB";} %
                                  Sentence.part2<-{Constituent.label=="VB";};};
    

    Mind that this rule would work only if the Sentence features (i.e part1) are annotations and not strings as it is in my case.

    So, for potential interested people, I post also the solution approached in my case:

    • Store Sentence features in separate annotations but keeping the link between the Sentence.part1 and its parent Sentence (this is possible in UIMA via parent pointers).
    • Apply the following rule:

      String rutaRule = "STRING id;"
              + "STRING part1Id;"
              + "STRING part2Id;"
              + "Sentence{->GETFEATURE(\"matchId\", id)};"
              + "part1{->GETFEATURE(\"parent\", part1Id)};"
              + "part2{->GETFEATURE(\"parent\", part2Id)};"
              + "Sentence{AND(IF(id == part1Id), IF(id == part2Id))-> NewAnnotation} <-"
              + "{part1<-{Constituent.label == \"VBD\";} % "
              + "part2<-{Constituent.label == \"MD\" # Constituent.label == \"VBN\";};};";
      
      Ruta.apply(cas,rutaRule);
      

    Hope this can be of any help.