Search code examples
pythonlistnltkwordnetpos-tagger

Extracting n-th element from lists of a list


I have the following output that I got by using nltk .tokenize(), .pos_tag(), and wordnet .synsets(). The output is a list of lists of potential matches for each token of document and wordnet's own part-of-speech tagging (here we have 4 tokens, hence, 4 lists of matches):

[[Synset('document.n.01'),
  Synset('document.n.02'),
  Synset('document.n.03'),
  Synset('text_file.n.01'),
  Synset('document.v.01'),
  Synset('document.v.02')],
 [Synset('be.v.01'),
  Synset('be.v.02'),
  Synset('be.v.03'),
  Synset('exist.v.01'),
  Synset('be.v.05'),
  Synset('equal.v.01'),
  Synset('constitute.v.01'),
  Synset('be.v.08'),
  Synset('embody.v.02'),
  Synset('be.v.10'),
  Synset('be.v.11'),
  Synset('be.v.12'),
  Synset('cost.v.01')],
 [Synset('angstrom.n.01'),
  Synset('vitamin_a.n.01'),
  Synset('deoxyadenosine_monophosphate.n.01'),
  Synset('adenine.n.01'),
  Synset('ampere.n.02'),
  Synset('a.n.06'),
  Synset('a.n.07')],
 [Synset('trial.n.02'),
  Synset('test.n.02'),
  Synset('examination.n.02'),
  Synset('test.n.04'),
  Synset('test.n.05'),
  Synset('test.n.06'),
  Synset('test.v.01'),
  Synset('screen.v.01'),
  Synset('quiz.v.01'),
  Synset('test.v.04'),
  Synset('test.v.05'),
  Synset('test.v.06'),
  Synset('test.v.07')]]

If I want to write a function (a loop, possibly) to extract only the first match for each token and generate the output as a new list, such as the following (using the example above):

[Synset('document.n.01'), Synset('be.v.01'), Synset('angstrom.n.01'), Synset('trial.n.02')]

What's the most flexible way to write such a function? So that it can be extended to other tokenized documents (with pos tagging)?

Thank you.


Solution

  • So I will solve a example to loop into list of such type, you can try the same with yours.

        a=[[1,2,3],[4,5,6],[7,8,9]]
        for x in a:
            print(x[0])
       Output looks like:
       1
       4
       7