Search code examples
pythonnlpspacylemmatization

Lemmatize a doc with spacy?


I have a spaCy doc that I would like to lemmatize.

For example:

import spacy
nlp = spacy.load('en_core_web_lg')

my_str = 'Python is the greatest language in the world'
doc = nlp(my_str)

How can I convert every token in the doc to its lemma?


Solution

  • Each token has a number of attributes, you can iterate through the doc to access them.

    For example: [token.lemma_ for token in doc]

    If you want to reconstruct the sentence you could use: ' '.join([token.lemma_ for token in doc])

    For a full list of token attributes see: https://spacy.io/api/token#attributes