python nlp nltk list-comprehension wordnet

Extract word from a list of synsets in NLTK for Python

Using this [x for x in wn.all_synsets('n')] I am able to get a list allnouns with all nouns from Wordnet with help from NLTK.

The list allnouns looks like this Synset('pile.n.01'), Synset('compost_heap.n.01'), Synset('mass.n.03') and so on. Now I am able to get any element by using allnouns[2] and this should be Synset('mass.n.03').

I would like to extract only the word mass but for some reason I cannot treat it like a string and everything I try shows a AttributeError: 'Synset' object has no attribute or TypeError: 'Synset' object is not subscriptable or <bound method Synset.name of Synset('mass.n.03')> if I try to use .name or .pos

Solution

How about trying this solution:

>>>> from nltk.corpus import wordnet as wn
>>>> wn.synset('mass.n.03').name().split(".")[0]
'mass'

For your case:

>>>> allnouns = [x for x in wn.all_synsets('n')]

The item at 23rd index is "Synset('substance.n.07')". Now, you can extract its name field like

>>>> allnouns[23].name().split(".")[0]
'substance'   #output

If you want only the 'name' fields of the synsets of 'noun' category in the list, then use:

>>>> [x.name().split(".")[0] for x in wn.all_synsets('n')]

should exactly give the result you need.

Note: In wordnet, name is not an attribute rather it is a function!