python nlp nltk stanford-nlp context-free-grammar

how to extract elements from tree.productions()

(1)My goal: To extract left-hand side and right-hand side of a production.

(2)My approach: I am employing stanford parser and nltk tools to extract parsetree of a sentence. My code is below:

corenlp_dir = "/home/corenlp-python/stanford-corenlp-full-2013-11-12/"
parser = corenlp.StanfordCoreNLP(corenlp_path=corenlp_dir)

result_json = json.loads(parser.parse("I have a tree."))
for sentence in result_json["sentences"]:
    t = Tree.fromstring(sentence["parsetree"])
    print t.productions()   # [ROOT -> S, S -> NP VP ., NP -> PRP, PRP -> 'I', VP -> VBP NP, VBP -> 'have', NP -> DT NN, DT -> 'a', NN -> 'tree', . -> '.']

    print t.productions()[1]  # S -> NP VP .
    print type(productions()[1])  # <class 'nltk.grammar.Production'>

    for (i,child) in enumerate(t): 
        print (i,child)  # (0, Tree('S', [Tree('NP', [Tree('PRP', ['I'])]), Tree('VP', [Tree('VBP', ['have']), Tree('NP', [Tree('DT', ['a']), Tree('NN', ['tree'])])]), Tree('.', ['.'])])) I can only get one tree.

(3)My question is how I can continue to extract elements from both sides of each production , such as 'S' and 'NP VP .'. Is there any method can be used to solve this problem?

Could anyone help me and maybe point out some directions?

Solution

nltk.Tree is actually a subclass of the Python list, so you can access the children of any node c by c[0], c[1], c[2], etc. Note that NLTK trees are not explicitly binary by design, so your notion of "left" and "right" might have to be enforced somewhere in a contract.

Assuming the tree is binary, you can access the left child of a node with c[0], and the right with c[1]. For your second task:

But what I want to do is to extract the left-hand side of a production and gather right-hand side of all productions with the same left-hand side.

If I understand correctly, you can traverse the tree and build up a dict as you go, where the keys are left-hand sides and the values are lists of possible right-hand productions. I'm not sure if nltk.Tree objects are hashable / immutable (if not, they wouldn't be usable as dict keys), but you could use the string form of the Tree objects as keys in any case.