I have a large set of WordNet synsets. A small portion of this set is:
syns = {"Synset('brutal.s.04')", "Synset('benignant.s.02')"}
I want to print out the synset term (the main lemma of the synset) for each synset in the set. For example, the output of the above set should be:
brutal, benignant
This is the code I used:
from nltk.corpus import wordnet as wn
for s in syns:
print(wn.s.lemmas[0])
but this does not work, because s is considered a string, and not an object. I get the following error:
AttributeError: 'WordNetCorpusReader' object has no attribute 's'
This is because s is seen as a string, and not as an object. I tried to change s to byte form like so:
s = bytes(s)
But that does not work. How can I print out only the lemma as mentioned above, in the simplest way?
I checked here, and this is a good way to do it, but my set of synsets are in string form, and not actually objects.
Thanks in advance..
>>> syns = {"Synset('brutal.s.04')", "Synset('benignant.s.02')"}
>>> [wn.synset(i[8:-2]) for i in syns]
[Synset('benignant.s.02'), Synset('brutal.s.04')]
>>> syns = [wn.synset(i[8:-2]) for i in syns]
>>> syns[0].lemma_names()
[u'benignant', u'gracious']
Firstly to get an input with the type printed out in strings is weird. So the first intuitive approach would be do something like ast.literal_eval()
or eval()
with the Synset type, https://github.com/nltk/nltk/blob/develop/nltk/corpus/reader/wordnet.py#L305 (but before that see http://nedbatchelder.com/blog/201206/eval_really_is_dangerous.html):
>>> from nltk.corpus.reader.wordnet import Synset
>>> from nltk.corpus import wordnet as wn
>>> syns = {"Synset('brutal.s.04')", "Synset('benignant.s.02')"}
>>> [eval(i) for i in syns]
[Synset('None'), Synset('None')]
Apparently, Synset
class won't work independent of the nltk.corpus.wordnet
. So we take a look at the wordnet.synset()
function instead (https://github.com/nltk/nltk/blob/develop/nltk/corpus/reader/wordnet.py#L1217). It seems like it only takes the pre-assigned name of a Synset
object, so:
>>> wn.synset('brutal.s.04')
Synset('brutal.s.04')
>>> type(wn.synset('brutal.s.04'))
<class 'nltk.corpus.reader.wordnet.Synset'>
And after which when the pseudo string synset in your input syns
becomes a Synset, you can easily control the Synset as what is shown How do I print out just the word itself in a WordNet synset using Python NLTK?
Back to your weird input syns
, doing the following will give me the name of the synset:
>>> syns = {"Synset('brutal.s.04')", "Synset('benignant.s.02')"}
>>> list(syns)[0]
"Synset('benignant.s.02')"
>>> list(syns)[0][8:-2]
'benignant.s.02'
So back to converting it into a Synset:
>>> syns = {"Synset('brutal.s.04')", "Synset('benignant.s.02')"}
>>> [wn.synset(i[8:-2]) for i in syns]
[Synset('benignant.s.02'), Synset('brutal.s.04')]
>>> syns = [wn.synset(i[8:-2]) for i in syns]
>>> syns[0].lemma_names()
[u'benignant', u'gracious']
But let's roll back altogether, you're getting a weird input syns
because someone has saved their output by simply casting a str()
to a Synset object:
>>> syns[0]
Synset('benignant.s.02')
>>> str(syns[0])
"Synset('benignant.s.02')"
The person could have simply done:
>>> syns[0].name()
u'benignant.s.02'
Which then your input syns
object will look like this:
syns = {u'brutal.s.04', u'benignant.s.02'}
and to read it, you can simply do:
>>> from nltk.corpus import wordnet as wn
>>> syns = {u'brutal.s.04', u'benignant.s.02'}
>>> syns = [wn.synset(i) for i in syns]
>>> syns[0]
Synset('brutal.s.04')
>>> syns[0].lemma_names()
[u'brutal']