Search code examples
pythonparsingnlpnltkcontext-free-grammar

How to extract the grammar productions rules given bracketed parses?


I have a sample sentence. "Open the door." that I parsed a sentence to get the bracketed parse output as below.

(S (VP (VB open) (NP (DT the) (NN door))) (. .))

I need to extract the CFG grammar rules that produce the parsed output. I can manually write them out as such:

grammar = CFG.fromstring("""   
S -> VP NP   
NP -> Det N   
VP -> V   
Det ->'the '   
N -> 'door'   
V -> 'Open'   
""")  

But it's time consuming, how do I produce the grammar rules given the bracketed parsed automatically?


Solution

  • You can use Tree.productions() method to get CFG rules from Tree.

    Example:

    from nltk import Tree
    
    t = Tree.fromstring("(S (VP (VB open) (NP (DT the) (NN door))) (. .))")
    print t.productions()
    

    Output:

    [S -> VP ., VP -> VB NP, VB -> 'open', NP -> DT NN, DT -> 'the', 
     NN -> 'door', . -> '.']
    

    For more information check - NLTK Tree Productions