Search code examples
pythonnlpnltkwordnet

Convert words between verb/noun/adjective forms


i would like a python library function that translates/converts across different parts of speech. sometimes it should output multiple words (e.g. "coder" and "code" are both nouns from the verb "to code", one's the subject the other's the object)

# :: String => List of String
print verbify('writer') # => ['write']
print nounize('written') # => ['writer']
print adjectivate('write') # => ['written']

i mostly care about verbs <=> nouns, for a note taking program i want to write. i.e. i can write "caffeine antagonizes A1" or "caffeine is an A1 antagonist" and with some NLP it can figure out they mean the same thing. (i know that's not easy, and that it will take NLP that parses and doesn't just tag, but i want to hack up a prototype).

similar questions ... Converting adjectives and adverbs to their noun forms (this answer only stems down to the root POS. i want to go between POS.)

ps called Conversion in linguistics http://en.wikipedia.org/wiki/Conversion_%28linguistics%29


Solution

  • This is more a heuristic approach. I have just coded it so appologies for the style. It uses the derivationally_related_forms() from wordnet. I have implemented nounify. I guess verbify works analogous. From what I've tested works pretty well:

    from nltk.corpus import wordnet as wn
    
    def nounify(verb_word):
        """ Transform a verb to the closest noun: die -> death """
        verb_synsets = wn.synsets(verb_word, pos="v")
    
        # Word not found
        if not verb_synsets:
            return []
    
        # Get all verb lemmas of the word
        verb_lemmas = [l for s in verb_synsets \
                       for l in s.lemmas if s.name.split('.')[1] == 'v']
    
        # Get related forms
        derivationally_related_forms = [(l, l.derivationally_related_forms()) \
                                        for l in    verb_lemmas]
    
        # filter only the nouns
        related_noun_lemmas = [l for drf in derivationally_related_forms \
                               for l in drf[1] if l.synset.name.split('.')[1] == 'n']
    
        # Extract the words from the lemmas
        words = [l.name for l in related_noun_lemmas]
        len_words = len(words)
    
        # Build the result in the form of a list containing tuples (word, probability)
        result = [(w, float(words.count(w))/len_words) for w in set(words)]
        result.sort(key=lambda w: -w[1])
    
        # return all the possibilities sorted by probability
        return result