Search code examples
pythonnlpnltkpos-taggerlemmatization

Which comes first in order of implementation: POS Tagging or Lemmatisation?


If I wanted to make a NLP Toolkit like NLTK, which features would I implement first after tokenisation and normalisation. POS Tagging or Lemmatisation?


Solution

  • Part of speech is important for lemmatisation to work, as words which have different meanings depending on part of speech. And using this information, lemmatization will return the base form or lemma. So, it would be better if POS Tagging implementation is done first.

    The main idea behind lemmatisation is to group different inflected forms of a word into one. For example, go, going, gone and went will become just one - go. But to derive this, lemmatisation would have to know the context of a word - whether the word is a noun or verb etc.

    So, the lemmatisation function can take the word and the part of speech as input and return the lemma after processing the information.