Search code examples
pythonpandascluster-analysislowercase

>AttributeError: 'list' object has no attribute 'lower' (in a lowercase dataframe)


I don't understand this error... I've already turned df into lowercase before turning it into a list

dataframe:

    all_cols
0   who is your hero and why
1   what do you do to relax
2   this is a hero
4   how many hours of sleep do you get a night
5   describe the last time you were relax

Code:

from sklearn.cluster import MeanShift
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import FunctionTransformer
from sklearn.feature_extraction.text import TfidfVectorizer

df['all_cols'] = df['all_cols'].str.lower()
df_list = df.values.tolist()

pipeline = Pipeline(steps=[
('tfidf', TfidfVectorizer()),
('trans', FunctionTransformer(lambda x: x.todense(), accept_sparse=True)),
('clust', MeanShift())])

pipeline.fit(df_list)
pipeline.named_steps['clust'].labels_

result = [(label,doc) for doc,label in zip(df_list, pipeline.named_steps['clust'].labels_)]

for label,doc in sorted(result):
    print(label, doc)

But I have an error in this line:

AttributeError Traceback (most recent call last) in

----> 1 pipeline.fit(df_list)

 2 pipeline.named_steps['clust'].labels_

AttributeError: 'list' object has no attribute 'lower'

But why is the program returning a lowercase error if I've already passed the lowercase dataframe before?


Solution

  • Specified column for df_list for avoid nested lists:

    df_list = df.values.tolist()
    print (df_list)
    [['who is your hero and why'], 
     ['what do you do to relax'], 
     ['this is a hero'], 
     ['how many hours of sleep do you get a night'], 
     ['describe the last time you were relax']]
    

    df_list = df['all_cols'].values.tolist()
    print (df_list)
    ['who is your hero and why', 
     'what do you do to relax', 
     'this is a hero',
     'how many hours of sleep do you get a night',
     'describe the last time you were relax']