I have the following output that I got by using nltk .tokenize(), .pos_tag(), and wordnet .synsets(). The output is a list of lists of potential matches for each token of document and wordnet's own part-of-speech tagging (here we have 4 tokens, hence, 4 lists of matches):
[[Synset('document.n.01'),
Synset('document.n.02'),
Synset('document.n.03'),
Synset('text_file.n.01'),
Synset('document.v.01'),
Synset('document.v.02')],
[Synset('be.v.01'),
Synset('be.v.02'),
Synset('be.v.03'),
Synset('exist.v.01'),
Synset('be.v.05'),
Synset('equal.v.01'),
Synset('constitute.v.01'),
Synset('be.v.08'),
Synset('embody.v.02'),
Synset('be.v.10'),
Synset('be.v.11'),
Synset('be.v.12'),
Synset('cost.v.01')],
[Synset('angstrom.n.01'),
Synset('vitamin_a.n.01'),
Synset('deoxyadenosine_monophosphate.n.01'),
Synset('adenine.n.01'),
Synset('ampere.n.02'),
Synset('a.n.06'),
Synset('a.n.07')],
[Synset('trial.n.02'),
Synset('test.n.02'),
Synset('examination.n.02'),
Synset('test.n.04'),
Synset('test.n.05'),
Synset('test.n.06'),
Synset('test.v.01'),
Synset('screen.v.01'),
Synset('quiz.v.01'),
Synset('test.v.04'),
Synset('test.v.05'),
Synset('test.v.06'),
Synset('test.v.07')]]
If I want to write a function (a loop, possibly) to extract only the first match for each token and generate the output as a new list, such as the following (using the example above):
[Synset('document.n.01'), Synset('be.v.01'), Synset('angstrom.n.01'), Synset('trial.n.02')]
What's the most flexible way to write such a function? So that it can be extended to other tokenized documents (with pos tagging)?
Thank you.
So I will solve a example to loop into list of such type, you can try the same with yours.
a=[[1,2,3],[4,5,6],[7,8,9]]
for x in a:
print(x[0])
Output looks like:
1
4
7