Python: list object has no attribute 'lower' - but corpus is already in lower case

My corpus is a series of documents with twitter data, and has been cleaned and pre-processed to be best of my knowledge (even including emoji)- example below:

    0         [national, interest, think, worth, holding, ta...
    1         [must, accurate, diane, abbott, done, calculat...

I then instantiate TFIDF:

    # Instantiate vectoriser
    vect = TfidfVectorizer()

    # Fit
    vect = TfidfVectorizer(min_df=10, ngram_range = (1,3)).fit(text)

When I try to fit this, I get:

   AttributeError: 'list' object has no attribute 'lower'

But I've already converted everything to lower case. Is this something to do with the fact that it's a series?

Solution

Convert a collection of raw documents to a matrix of TF-IDF features.

You are passing in this sense a series of list in your dataframe replicated here:

from sklearn.feature_extraction.text import TfidfVectorizer
import pandas as pd

l1 = 'national, interest, think, worth, holding,'.split(',')
l2 = 'must, accurate, diane, abbott, done'.split(',')

df = pd.DataFrame([[l1],[l2]])

text = df[0]

which returns your text parameter as:

0    [national,  interest,  think,  worth,  holding, ]
1            [must,  accurate,  diane,  abbott,  done]
Name: 0, dtype: object

This obviously will not work and as pointed out, TfidfVectorizer accepts strings or documents. In your case and as per the example, although slightly counter-intuitive from your example.

corpus = text.apply(lambda x: ','.join(x)).to_list() # converts your series into a list of strings

vectorizer = TfidfVectorizer()
X = vectorizer.fit_transform(corpus)

print(vectorizer.get_feature_names())

['abbott', 'accurate', 'diane', 'done', 'holding', 'interest', 'must', 'national', 'think', 'worth']