Search code examples
pythonparsingnlpnltkcontext-free-grammar

NLTK Context Free Grammar Genaration


I'm working on a non-English parser with Unicode characters. For that, I decided to use NLTK.

But it requires a predefined context-free grammar as below:

  S -> NP VP
  VP -> V NP | V NP PP
  PP -> P NP
  V -> "saw" | "ate" | "walked"
  NP -> "John" | "Mary" | "Bob" | Det N | Det N PP
  Det -> "a" | "an" | "the" | "my"
  N -> "man" | "dog" | "cat" | "telescope" | "park"
  P -> "in" | "on" | "by" | "with" 

In my app, I am supposed to minimize hard coding with the use of a rule-based grammar. For example, I can assume any word ending with -ed or -ing as a verb. So it should work for any given context.

How can I feed such grammar rules to NLTK? Or generate them dynamically using Finite State Machine?


Solution

  • Maybe you're looking for CFG.fromstring() (formerly parse_cfg())?

    From Chapter 7 of the NLTK book (updated to NLTK 3.0):

    > grammar = nltk.CFG.fromstring("""
     S -> NP VP
     VP -> V NP | V NP PP
     V -> "saw" | "ate"
     NP -> "John" | "Mary" | "Bob" | Det N | Det N PP
     Det -> "a" | "an" | "the" | "my"
     N -> "dog" | "cat" | "cookie" | "park"
     PP -> P NP
     P -> "in" | "on" | "by" | "with"
     """)
    
    > sent = "Mary saw Bob".split()
    > rd_parser = nltk.RecursiveDescentParser(grammar)
    > for p in rd_parser.parse(sent):
          print p
    (S (NP Mary) (VP (V saw) (NP Bob)))