Search code examples
pythonrecursionnltkstanford-nlp

Grammar rule extraction from parsed result


I get following result when i execute stanford parser from nltk.

(S (VP (VB get) (NP (PRP me)) (ADVP (RB now))))

but i need it in the form

S -> VP
VP -> VB NP ADVP
VB -> get
PRP -> me
RB -> now

How can I get this result, perhaps using recursive function. Is there in-built function already?


Solution

  • First to navigate a tree, see How to iterate through all nodes of a tree? and How to navigate a nltk.tree.Tree? :

    >>> from nltk.tree import Tree
    >>> bracket_parse = "(S (VP (VB get) (NP (PRP me)) (ADVP (RB now))))"
    >>> ptree = Tree.fromstring(bracket_parse)
    >>> ptree
    Tree('S', [Tree('VP', [Tree('VB', ['get']), Tree('NP', [Tree('PRP', ['me'])]), Tree('ADVP', [Tree('RB', ['now'])])])])
    >>> for subtree in ptree.subtrees():
    ...     print subtree
    ... 
    (S (VP (VB get) (NP (PRP me)) (ADVP (RB now))))
    (VP (VB get) (NP (PRP me)) (ADVP (RB now)))
    (VB get)
    (NP (PRP me))
    (PRP me)
    (ADVP (RB now))
    (RB now)
    

    And what you're looking for is https://github.com/nltk/nltk/blob/develop/nltk/tree.py#L341:

    >>> ptree.productions()
    [S -> VP, VP -> VB NP ADVP, VB -> 'get', NP -> PRP, PRP -> 'me', ADVP -> RB, RB -> 'now']
    

    Note that Tree.productions() returns a Production object, see https://github.com/nltk/nltk/blob/develop/nltk/tree.py#L22 and https://github.com/nltk/nltk/blob/develop/nltk/grammar.py#L236.

    If you want a string form of the grammar rules, you can either do:

    >>> for rule in ptree.productions():
    ...     print rule
    ... 
    S -> VP
    VP -> VB NP ADVP
    VB -> 'get'
    NP -> PRP
    PRP -> 'me'
    ADVP -> RB
    RB -> 'now'
    

    Or

    >>> rules = [str(p) for p in ptree.productions()]
    >>> rules
    ['S -> VP', 'VP -> VB NP ADVP', "VB -> 'get'", 'NP -> PRP', "PRP -> 'me'", 'ADVP -> RB', "RB -> 'now'"]