I have been trying to get all the nouns, verbs..etc separately from the brown corpus, so I tried to use the code
but apparently this code works with wordnet only. I am using python 3.4 by the way.
@alvas answer worked. But when I used it with random it gets an error. Have a look.
nn = {word for word, pos in brown.tagged_words() if pos.startswith('NN')}
the output is
{'such', 'rather', 'Quite', 'Such', 'quite'}
but when I use
I get
Traceback (most recent call last):
File "/home/aziz/Desktop/2222.py", line 5, in <module>
File "/usr/lib/python3.4/random.py", line 256, in choice
return seq[i]
TypeError: 'set' object does not support indexing
>>> from nltk.corpus import brown
>>> {word for word, pos in brown.tagged_words() if pos.startswith('NN')}
In Longer
Iterate through the .tagged_words()
function and that will return a list of ('word', 'POS')
>>> from nltk.corpus import brown
>>> brown.tagged_words()
[(u'The', u'AT'), (u'Fulton', u'NP-TL'), ...]
Please read this chapter to know how NLTK corpora API works: http://www.nltk.org/book/ch02.html
Then, do a list comprehension over it and save a set (i.e. unique list) of the words that are tagged with the noun tags, e.g. NN, NNS, NNP, etc.
>>> {word for word, pos in brown.tagged_words() if pos.startswith('NN')}
Note that the output might not be what you expect because words that are POS tagged with syntactic and syntactic noun is not necessary a semantic argument/entity.
Also, I don't think that the words you've extracted are correct. Double checking the list:
>>> nouns = {word for word, pos in brown.tagged_words() if pos.startswith('NN')}
>>> 'rather' in nouns
>>> 'such' in nouns
>>> 'Quite' in nouns
>>> 'quite' in nouns
>>> 'Such' in nouns
The output to the list comprehension: http://pastebin.com/bJaPdpUk
Why random.choice(nn)
fails when nn
is a set?
The input to random.choice()
is a sequence (see https://docs.python.org/2/library/random.html#random.choice).
Return a random element from the non-empty sequence seq. If seq is empty, raises IndexError.
And python sequence types in python are
str, unicode, list, tuple, bytearray, buffer, xrange
in Python 2.x (see https://docs.python.org/2/library/stdtypes.html#sequence-types-str-unicode-list-tuple-bytearray-buffer-xrange).list, tuple, range
in Python 3.x (see https://docs.python.org/3.6/library/stdtypes.html#sequence-types-list-tuple-range)bytes, bytearray, memoryview
in Python 3.x str
in Python 3.xSince set
isn't a sequence, you will get the IndexError