Search code examples
pythonnlpparse-tree

Randomly generated Parse Tree using a fix set of vocabulary


I am using Python 3.2 and I have tried to build a randomly generated parse tree for a sentence.Though I am sure it generates the sentence, I am not sure how random the parse tree is and also, I do not know if there is a better/more efficient way to improve this code. (I am new to programming and Python as such and I have recently been interested in NLP. Any advice, solution or corrections are welcome.)

 N=['man','dog','cat','telescope','park']  #noun
 P=['in','on','by','with']   #preposition
 det=['a','an','the','my']   #determinant
 V=['saw','ate','walked']    #verb
NP=['John','Mary','Bob']    #noun phrase


from random import choice
 PP=choice(NP)+' '+choice(P)   #preposition phrase
 PP=''.join(PP)
 VP=''.join(choice(V)+' '+choice(NP)) or''.join(choice(V)+' '.choice(NP)+(PP)) #verb phrase         
 VP=''.join(VP) #verb phrase 
 S=choice(NP)+' '+VP  #sentence
 print(S)

Solution

  • try NLTK, http://nltk.org/book/ch08.html

    import nltk
    from random import choice, shuffle, random
    
    # Sometimes i find reading terminals as values into a dict of POS helps.
    vocab={
    'Det':['a','an','the','my'],
    'N':['man','dog','cat','telescope','park'],
    'V':['saw','ate','walked'],
    'P':['in','on','by','with'],
    'NP':['John','Mary','Bob']
    }
    
    vocab2string = [pos + " -> '" + "' | '".join(vocab[pos])+"'" for pos in vocab]
    
    # Rules are simpler to be manually crafted so i left them in strings
    rules = '''
    S -> NP VP
    VP -> V NP
    VP -> V NP PP
    PP -> NP P
    NP -> Det N
    '''
    
    mygrammar = rules + "\n".join(vocab2string)
    grammar = nltk.parse_cfg(mygrammar) # Loaded your grammar
    parser =  nltk.ChartParser(grammar) # Loaded grammar into a parser
    
    # Randomly select one terminal from each POS, based on infinite monkey theorem, i.e. selection of words without grammatical order, see https://en.wikipedia.org/wiki/Infinite_monkey_theorem
    words = [choice(vocab[pos]) for pos in vocab if pos != 'P'] # without PP
    words = [choice(vocab[pos]) for pos in vocab] + choice(vocab('NP')) # with a PP you need 3 NPs
    
    # To make sure that you always generate a grammatical sentence
    trees = []
    while trees != []:
      shuffle(words)
      trees = parser.nbest_parse(words)
    
    for t in trees:
      print t