Search code examples
javatext-mininggate

Can we grouping Annotations in GATE


How can we group all annotations between two annotations?

I'm new to GATE and am trying to group annotations together , Not sure if we can do this , Please help. For Example In the following text :

Page-1
Age:53 
Person: Nathan

Page-2
Treatment : Initial Evaluation
History: Yes

Page-3
..........

If my Gazetteer list consists of different tags, page tag for each page number, age, person, Treatment, History etc. I want to group all tags from Page-1 to Page-2 under Page-1 Annotation and all tags between Page-2 and Page-3 under Page-2.

Please let me know if more information required on this question.

Thanks in advance.


Solution

  • I'm not entirely sure what you mean by "group together" but you can certainly create annotations that span across the content of each "page". Assuming you have a PageNumber annotation on each "Page-1", "Page-2" etc. then you can use something like this to create annotations spanning from one PageNumber to the next. I'm using a control = once JAPE to do this, you could equivalently use a Groovy script or a custom PR

    Imports: { import static gate.Utils.*; }
    Phase: PageSpans
    Input: PageNumber
    Options: control = once
    
    Rule: PageSpan
    ({PageNumber})
    -->
    {
      try {
        List<Annotation> numbers = inDocumentOrder(inputAS.get("PageNumber"));
        for(int i = 0; i < numbers.size(); i++) {
          outputAS.add(start(numbers.get(i)), // from start of this PageNumber, to...
                       (i+1 < numbers.size()
                         ? start(numbers.get(i+1)) // start of the next number, or...
                         : end(doc) // ...if no more PageNumbers then end of document
                       ),
                       "Page",
                       // store the text under the PageNumber as a feature of Page
                       featureMap("id", stringFor(doc, numbers.get(i))));
        }
      } catch(InvalidOffsetException e) {
        throw new JapeException("Invalid offset from existing annotation", e);
      }
    }
    

    In your comment you ask about moving all the annotations under each "page" into a separate annotation set. This would be relatively straightforward once you have done the above, and if you have the page number as a feature on your Page annotations as I have done with the "id" feature. Then you could define another JAPE that does something like this:

    Imports: { import static gate.Utils.*; }
    Phase: SetPerPage
    Input: Age X Y // and whatever other annotation types you want to copy
    Options: control = all
    
    Rule: MoveToPageSet
    ({Age}|{X}|{Y}):entity
    -->
    :entity {
      try {
        for(Annotation e : entityAnnots) {
          // find the (only) Page annotation that covers this entity
          Annotation thePage = getOnlyAnn(getCoveringAnnotations(inputAS, e, "Page"));
          // get the corresponding annotation set
          AnnotationSet pageSet = doc.getAnnotations(
                  (String)thePage.getFeatures().get("id"));
          // and copy the annotation into it
          pageSet.add(start(e), end(e), e.getType(), e.getFeatures());
        }
      } catch(InvalidOffsetException e) {
        throw new JapeException("Invalid offset from existing annotation", e);
      }
      // optionally remove from input set
      // inputAS.removeAll(entityAnnots);
    }