Search code examples
pythonpandastextlemmatization

Type Error during text lemmatization in Pandas Dataframe


I am working with text data and performing Pre-processing steps on it.

I am using SpaCy modeule to perform lemmatization on text. I have written code as below:

import spacy
import de_core_news_sm
nlp = de_core_news_sm.load()

def spacy_lemma_text(text):
    doc = nlp(text)
    tokens = [tok.lemma_.lower().strip() for tok in doc]
    tokens = ' '.join(tokens)
    return tokens

df['spacy_lemma_text'] = data['Text'].apply(spacy_lemma_text)

The code gives below error. I have tried many alternatives. I think it is related to pandas dataframe. Please help me solve error.

TypeError: 'NoneType' object does not support item assignment

Solution

  • One idea is apply solution only for non missing and no None values:

    m = data['Text'].notna()
    data.loc[m, 'spacy_lemma_text'] = data.loc[m, 'Text'].apply(spacy_lemma_text)