Search code examples
javagrammargate

Grammar with Gate ANNIE


Hello I have been trying to work on information retrieval for quite sometime and have been facing some difficulties. Recently I downloaded StandAloneAnnie.java from following link

http://gate.ac.uk/wiki/code-repository/src/sheffield/examples/StandAloneAnnie.java Though I have been able to execute it and see the output I have a query or two.

  1. This program annotates people and locations, where is the grammar stored for annotating such entities.

  2. How can I write my own simple grammar to extract some data and use it in my copy of StandAloneAnnie.java?

Previous posts Hundreds of RegEx on one string New to NLP, Question about annotation


Solution

  • Following is a simple grammar for tagging Height of a person

    Phase: Meaurements
    Input: Token Number 
    Options: control=appelt debug=true
    
    
    
    Rule: Height
    (
    ({Number})
    ( {Token.string=~"[Ff]t"} | {Token.string=~"[Ii]n"} | {Token.string=~"[Cc]m"})
    ):height
    -->
    :height.Height= {value= :height.Number.value, unit= :height.Token.string}
    

    This is the main code that gets executed,

        public static void main(String arg[]) {
    
                Gate.init();
                gate.Corpus corpus= (Corpus) Factory.createResource("gate.corpora.CorpusImpl");
    
    //You need to register the plugin before you load it.
    
                Gate.getCreoleRegister().registerDirectories(new File(Gate.getPluginsHome(), ANNIEConstants.PLUGIN_DIR).toURI().toURL());
                Gate.getCreoleRegister().registerDirectories(new URL("file:///GATE_HOME/plugins/Tagger_Numbers"));//change this path
    
    
                Document doc = new DocumentImpl();
    //The string to be annotated.
    
    String str = "Height is 60 in. Weight is 150 lbs pulse rate 90 Pulse rate 90";
    DocumentContentImpl impl = new DocumentContentImpl(str);
    doc.setContent(impl);
    
    //Loading processing resources. refer http://gate.ac.uk/gate/doc/plugins.html for what class the plugin belongs to
    
                ProcessingResource token = (ProcessingResource) Factory.createResource("gate.creole.tokeniser.DefaultTokeniser", Factory.newFeatureMap());
                ProcessingResource sspliter = (ProcessingResource) Factory.createResource("gate.creole.splitter.SentenceSplitter", Factory.newFeatureMap());
                ProcessingResource number = (ProcessingResource) Factory.createResource("gate.creole.numbers.NumbersTagger", Factory.newFeatureMap());
    
    
    /*pipeline is an application that needs to be created to use resources loaded above.
    Reasources must be added in a particular order eg. below the 'number' resource requires the document to be tokenised. */
    
    corpus.add(doc);
    SerialAnalyserController pipeline = (SerialAnalyserController) Factory.createResource("gate.creole.SerialAnalyserController", Factory.newFeatureMap(), Factory.newFeatureMap(), "ANNIE");
    pipeline.setCorpus(corpus);
    pipeline.add(token);
    pipeline.add(sspliter);
    pipeline.add(number);
    pipeline.execute();
    
    //Extract info from an annotated document.
    
    AnnotationSetImpl ann=(AnnotationSetImpl)doc.getAnnotations();
    Iterator<Annotation>i = ann.get(vital).iterator();
    Annotation annotation = i.next();
    long start = annotation.getStartNode().getOffset();
    long end =  annotation.getEndNode().getOffset();
    System.out.println(doc.toString().substring((int)start, (int)end));
    
    }
    

    Note:- In the above code, the grammar for Height will be written in a .jape file. You need to run this grammar using a JAPE(JAPE Plus) transducer. We just need to execute the application('pipeline') in our main code. You can find tutorial for writing jape at gate.ac.uk/sale/tao