Search code examples
pythonnlpnltklist-comprehensionwordnet

Extract word from a list of synsets in NLTK for Python


Using this [x for x in wn.all_synsets('n')] I am able to get a list allnouns with all nouns from Wordnet with help from NLTK.

The list allnouns looks like this Synset('pile.n.01'), Synset('compost_heap.n.01'), Synset('mass.n.03') and so on. Now I am able to get any element by using allnouns[2] and this should be Synset('mass.n.03').

I would like to extract only the word mass but for some reason I cannot treat it like a string and everything I try shows a AttributeError: 'Synset' object has no attribute or TypeError: 'Synset' object is not subscriptable or <bound method Synset.name of Synset('mass.n.03')> if I try to use .name or .pos


Solution

  • How about trying this solution:

    >>>> from nltk.corpus import wordnet as wn
    >>>> wn.synset('mass.n.03').name().split(".")[0]
    'mass'
    

    For your case:

    >>>> allnouns = [x for x in wn.all_synsets('n')]  
    

    The item at 23rd index is "Synset('substance.n.07')". Now, you can extract its name field like

    >>>> allnouns[23].name().split(".")[0]
    'substance'   #output
    

    If you want only the 'name' fields of the synsets of 'noun' category in the list, then use:

    >>>> [x.name().split(".")[0] for x in wn.all_synsets('n')]
    

    should exactly give the result you need.

    Note: In wordnet, name is not an attribute rather it is a function!