Search code examples
pythonnlpnltkgrammar

Combine two regexp grammars in nltk


I'm defining a noun phrase using grammar in nltk. The example provided by nltk is:

grammar = "NP: {<DT>?<NNP>*<NN>}"

Then if I have a sentence like: show me the Paris hospitals, the library can detect the noun phrase:

>>> s
'show me the Paris hospitals'
>>> grammar = "NP: {<DT>?<NNP>*<NNS>}"
>>> nltk.RegexpParser(grammar).parse(nltk.pos_tag(nltk.word_tokenize(s)))
Tree('S', [('show', 'VB'), ('me', 'PRP'), Tree('NP', [('the', 'DT'), ('Paris', 'NNP'), ('hospitals', 'NNS')])])

Now, the sentence can be written in another way: show me the hospitals of Paris, and hence I need to change the grammar to:

>>> grammar = "NP: {<DT>?<NNS><IN><NNP>}"
>>> s = "show me the hospitals in Paris"
>>> nltk.RegexpParser(grammar).parse(nltk.pos_tag(nltk.word_tokenize(s)))
Tree('S', [('show', 'VB'), ('me', 'PRP'), Tree('NP', [('the', 'DT'), ('hospitals', 'NNS'), ('in', 'IN'), ('Paris', 'NNP')])])

How do I combine the two grammars in a unique one? I couldn't figure out the OR condition for the two grammars.


Solution

  • You can just define two NP rules in one grammar:

    grammar = """
    NP: {<DT>?<NNP>*<NNS>}
    NP: {<DT>?<NNS><IN><NNP>}
    """
    

    or using | as the wanted OR condition:

    grammar = "NP: {<DT>?<NNP>*<NNS>|<DT>?<NNS><IN><NNP>}"
    

    Full example:

    import nltk
    
    sentence_1 = 'show me the Paris hospitals'
    sentence_2 = "show me the hospitals in Paris"
    
    grammar_1 = """
    NP: {<DT>?<NNP>*<NNS>}
    NP: {<DT>?<NNS><IN><NNP>}
    """
    parser_1 = nltk.RegexpParser(grammar_1)
    
    grammar_2 = "NP: {<DT>?<NNP>*<NNS>|<DT>?<NNS><IN><NNP>}"
    parser_2 = nltk.RegexpParser(grammar_2)
    
    for s in sentence_1, sentence_2:
        tokens = nltk.word_tokenize(s)
        pos_tags = nltk.pos_tag(tokens)
        print(parser_1.parse(pos_tags))
        print(parser_2.parse(pos_tags))
    
    # outputs the same for both parsers:
    # (S show/VB me/PRP (NP the/DT Paris/NNP hospitals/NNS))
    # (S show/VB me/PRP (NP the/DT Paris/NNP hospitals/NNS))
    # (S show/VB me/PRP (NP the/DT hospitals/NNS) in/IN Paris/NNP)
    # (S show/VB me/PRP (NP the/DT hospitals/NNS) in/IN Paris/NNP)
    

    (link to the documentation)