Search code examples
javastanford-nlpgate

Get token string from tokenID using Stanford Parser in GATE


I am trying to use some Java RHS to get the string value of dependent tokens using Stanford dependency parser in GATE, and add them as features of a new annotation.

I am having problems targeting just the 'dependencies' feature of the token, and getting the string value from the tokenID.

Using below specifying only 'depdencies' also throws a java null pointer error:

for(Annotation lookupAnn : tokens.inDocumentOrder())
  {
   FeatureMap lookupFeatures  = lookupAnn.getFeatures();
   token = lookupFeatures.get("dependencies").toString();  
  }

I can use below to get all the features of a token,

gate.Utils.inDocumentOrder

but it returns all features, including the dependent tokenID's; i.e:

dependencies = [nsubj(8390), dobj(8394)]

I would like to get just the dependent token's string value from these tokenID's.

Is there any way to access dependent token string value and add them as a feature to the annotation?

Many thanks for your help


Solution

  • Here is a working JAPE example. It only printns to the GATE's message window (std out), It doesn't create any new annotations with features you asked for. Please finish it yourself...

    Stanford_CoreNLP plugin has to be loaded in GATE to make this JAPE file loadable. Otherwise you will get class not found exception for DependencyRelation class.

    Imports: {
      import gate.stanford.DependencyRelation;
    }
    
    Phase: GetTokenDepsPhase
    Input: Token
    Options: control = all
    Rule: GetTokenDepsRule
    (
      {Token}
    ): token
    --> 
    :token {
      //note that tokenAnnots contains only a single annotation so the loop could be avoided...
      for (Annotation token : tokenAnnots) {
        Object deps = token.getFeatures().get("dependencies");
    
        //sometimes the dependencies feature is missing - skip it
        if (deps == null) continue;
    
        //token.getFeatures().get("string") could be used instead of gate.Utils.stringFor(doc,token)...
        System.out.println("Dependencies for token " + gate.Utils.stringFor(doc, token));
    
        //the dependencies feature has to be typed to List<DependencyRelation>
        List<DependencyRelation> typedDeps = (List<DependencyRelation>) deps;
        for (DependencyRelation r : typedDeps) {
    
          //use DependencyRelation.getTargetId() to get the id of the target token
          //use inputAS.get(id) to get the annotation for its id
          Annotation targetToken = inputAS.get(r.getTargetId());
    
          //use DependencyRelation.getType() to get the dependency type
          System.out.println("  " +r.getType()+ ": " +gate.Utils.stringFor(doc, targetToken));
        }
      }
    }