Search code examples
javauimaruta

How to compare features of two different annotations within a Ruta rule?


I am processing a text with UIMA Ruta and want to remove duplicated annotations. I consider an annotation to be duplicated if certain features, for instance a name, have the same value. I have unsuccessfully tried different approaches, but I hope the following examples will give an idea of what I am trying to do:

STRING nameVal;
Person {-> GETFEATURE("name", nameVal)}  
ANY+? 
Person.name == nameVal {-> UNMARK(Person)};

I have also tried this variation:

STRING nameVal;
Person {-> GETFEATURE("name", nameVal)}  
ANY+? 
Person {-> UNMARK(Person)} <- { Person.name == nameVal; };

If I replace the variable nameVal with a literal (see next example), the rules work well and seem to be close to what I want, but not quite.

Person
ANY+? 
Person.name == "Mustermann" {-> UNMARK(Person)};

I believe, the problem is that, when the comparison is evaluated, the global variable has not yet been initialized. Is there a way in Ruta to compare a feature of the first matched annotation with a feature of the last matched annotation inside the same rule?


Solution

  • Yes, the problem is that the actions are executed when the complete rule has matched after all conditions are evaluated. You need an action to assign the feature value to a variable, but you need a condition for comparing the variable to another feature.

    However, there are many ways to solve this in Ruta nevertheless, e.g., with more rules, BLOCK or action inlined rules. The best way are label expression. UIMA Ruta 2.5.0 makes our life much easier here. You can write something like this:

    p1:Person # p2:Person{p1.name == p2.name -> UNMARK(Person)};
    

    or

    p1:Person # Person.name==p1.name{ -> UNMARK(Person)};
    

    You can probably write a faster rule if you use a STRINGLIST: If the value is contained in the list, then unmark the annotation, if not, then add the value to the list.

    DISCLAIMER: I am a developer of UIMA Ruta