python nltk stemming lemmatization textblob

Why my output return in a strip-format and cannot be lemmatized/stemmed in Python?

First step is tokenizing the text from dataframe using NLTK. Then, I create a spelling correction using TextBlob. For this, I convert the output from tuple to string. After that, I need to lemmatize/stem (using NLTK). The problem is my output return in a strip-format. Thus, it cannot be lemmatized/stemmed.

#create a dataframe
import pandas as pd
import nltk
df = pd.DataFrame({'text': ["spellling", "was", "working cooking listening","studying"]})

#tokenization
w_tokenizer = nltk.tokenize.WhitespaceTokenizer()
def tokenize(text):
    return [w for w in w_tokenizer.tokenize(text)]
df["text2"] = df["text"].apply(token)

#spelling correction
def spell_eng(text):
  text=TextBlob(str(text)).correct()
  #convert from tuple to str
  text=functools.reduce(operator.add, (text))
  return text
df['text3'] = df['text2'].apply(spell_eng)


#lemmatization/stemming
def stem_eng(text):
   lemmatizer = nltk.stem.WordNetLemmatizer()
   return [lemmatizer.lemmatize(w,'v') for w in text]
df['text4'] = df['text3'].apply(stem_eng)

Generated output:

Desired output:

text4
--------------
[spell]
[be]
[work,cook,listen]
[study]

Solution

I got where the problem is, the dataframes are storing these arrays as a string. So, the lemmatization is not working. Also note that, it is from the spell_eng part.

I have written a solution, which is a slight modification for your code.

import pandas as pd
import nltk
from textblob import TextBlob
import functools
import operator

df = pd.DataFrame({'text': ["spellling", "was", "working cooking listening","studying"]})

#tokenization
w_tokenizer = nltk.tokenize.WhitespaceTokenizer()
def tokenize(text):
    return [w for w in w_tokenizer.tokenize(text)]
df["text2"] = df["text"].apply(tokenize)


# spelling correction
def spell_eng(text):
    text = [TextBlob(str(w)).correct() for w in text] #CHANGE
    #convert from tuple to str
    text = [functools.reduce(operator.add, (w)) for w in text] #CHANGE
    return text

df['text3'] = df['text2'].apply(spell_eng)


# lemmatization/stemming
def stem_eng(text):
    lemmatizer = nltk.stem.WordNetLemmatizer()
    return [lemmatizer.lemmatize(w,'v') for w in text] 
df['text4'] = df['text3'].apply(stem_eng)
df['text4']

Hope these things help.