Search code examples
pythonnlpnltksemanticscontext-free-grammar

Specify NLTK feature grammar within Python function in code


I have parsed input string by loading grammar specified within a .fcfg file as given in the NLTK book. Is there anyway to specify this grammar within the Python function itself?

Grammar:

% start S
S[SEM=(?np + WHERE + ?vp)] -> NP[SEM=?np] VP[SEM=?vp]
VP[SEM=(?v + ?pp)] -> IV[SEM=?v] PP[SEM=?pp]
VP[SEM=(?v + ?ap)] -> IV[SEM=?v] AP[SEM=?ap]
NP[SEM=(?det + ?n)] -> Det[SEM=?det] N[SEM=?n]
PP[SEM=(?p + ?np)] -> P[SEM=?p] NP[SEM=?np]
AP[SEM=?pp] -> A[SEM=?a] PP[SEM=?pp]
NP[SEM='Country="greece"'] -> 'Greece'
NP[SEM='Country="china"'] -> 'China'
Det[SEM='SELECT'] -> 'Which' | 'What'
N[SEM='City FROM city_table'] -> 'cities'
IV[SEM=''] -> 'are'
A[SEM=''] -> 'located'
P[SEM=''] -> 'in'

I need this because I need to create the grammar dynamically w.r.t to input string before parsing.


Solution

  • Yes, use the nltk.grammar.FeatureGrammar.fromstring() function, e.g.

    from nltk import grammar, parse
    from nltk.parse.generate import generate
    
    # If person is always 3rd, we can skip the PERSON feature.
    g = """
    S[SEM=(?np + WHERE + ?vp)] -> NP[SEM=?np] VP[SEM=?vp]
    VP[SEM=(?v + ?pp)] -> IV[SEM=?v] PP[SEM=?pp]
    VP[SEM=(?v + ?ap)] -> IV[SEM=?v] AP[SEM=?ap]
    NP[SEM=(?det + ?n)] -> Det[SEM=?det] N[SEM=?n]
    PP[SEM=(?p + ?np)] -> P[SEM=?p] NP[SEM=?np]
    AP[SEM=?pp] -> A[SEM=?a] PP[SEM=?pp]
    NP[SEM='Country="greece"'] -> 'Greece'
    NP[SEM='Country="china"'] -> 'China'
    Det[SEM='SELECT'] -> 'Which' | 'What'
    N[SEM='City FROM city_table'] -> 'cities'
    IV[SEM=''] -> 'are'
    A[SEM=''] -> 'located'
    P[SEM=''] -> 'in'
    """
    
    grammar =  grammar.FeatureGrammar.fromstring(g)
    
    for sent in generate(grammar, n=30):
        print(sent)
    

    [out]:

    ['Which', 'cities', 'are', 'in', 'Which', 'cities']
    ['Which', 'cities', 'are', 'in', 'What', 'cities']
    ['Which', 'cities', 'are', 'in', 'Greece']
    ['Which', 'cities', 'are', 'in', 'China']
    ['Which', 'cities', 'are', 'located', 'in', 'Which', 'cities']
    ['Which', 'cities', 'are', 'located', 'in', 'What', 'cities']
    ['Which', 'cities', 'are', 'located', 'in', 'Greece']
    ['Which', 'cities', 'are', 'located', 'in', 'China']
    ['What', 'cities', 'are', 'in', 'Which', 'cities']
    ['What', 'cities', 'are', 'in', 'What', 'cities']
    ['What', 'cities', 'are', 'in', 'Greece']
    ['What', 'cities', 'are', 'in', 'China']
    ['What', 'cities', 'are', 'located', 'in', 'Which', 'cities']
    ['What', 'cities', 'are', 'located', 'in', 'What', 'cities']
    ['What', 'cities', 'are', 'located', 'in', 'Greece']
    ['What', 'cities', 'are', 'located', 'in', 'China']
    ['Greece', 'are', 'in', 'Which', 'cities']
    ['Greece', 'are', 'in', 'What', 'cities']
    ['Greece', 'are', 'in', 'Greece']
    ['Greece', 'are', 'in', 'China']
    ['Greece', 'are', 'located', 'in', 'Which', 'cities']
    ['Greece', 'are', 'located', 'in', 'What', 'cities']
    ['Greece', 'are', 'located', 'in', 'Greece']
    ['Greece', 'are', 'located', 'in', 'China']
    ['China', 'are', 'in', 'Which', 'cities']
    ['China', 'are', 'in', 'What', 'cities']
    ['China', 'are', 'in', 'Greece']
    ['China', 'are', 'in', 'China']
    ['China', 'are', 'located', 'in', 'Which', 'cities']
    ['China', 'are', 'located', 'in', 'What', 'cities']