Below is a method that I have tried coding out. However, in line 3 of the codes it said there is an attribute error and that 'WordListCorpusReader' object has no attribute 'word' in python. Please do help me take a look at the below codes :((
'''step 3. conduct preprocessing steps'''
# setting up the resources for the preprocessing steps
stop = set(stopwords.word('english'))
exclude = set(string.punctuation)
lemma = WordNetLemmatizer()
def clean(doc):
stop_free = ''.join([i for i in doc.lower().split() if i not in stop])
punc_free = ''.join([ch for ch in stop_free if ch not in exclude])
normalized = ''.join(wn.lemma.lemmatize(word) for word in punc_free.split())
return normalized
doc_clean = [clean(doc).split() for doc in corpus]
'''step 4. prepare word representation'''
dictionary = corpora.Dictionary(doc_clean)
doc_term_matrix = [dictionary.doc2bow(doc) for doc in doc_clean]
'''step 5. create lda model'''
topic_num = 5
word_num = 5
Lda = gensim.models.ldamodel.LdaModel
ldamodel = Lda(doc_term_matrix, num_topics=topic_num, id2word=dictionary, passes=20)
pprint(ldamodel.print_topics(num_topics=topic_num, num_words=word_num))
This is the trace back after running the codes:
Traceback (most recent call last):
File "C:/Users/user/PycharmProjects/topicmodel/topicmodel.py", line 41, in <module>
stop = set(stopwords.word('english'))
File "C:\Users\user\AppData\Roaming\Python\Python37\site-packages\nltk\corpus\util.py", line 119, in __getattr__
return getattr(self, attr)
AttributeError: 'WordListCorpusReader' object has no attribute 'word'
It's a typo. The method you should be calling is stopwords.words()
. Change that:
stop = set(stopwords.word('english'))
into
stop = set(stopwords.words('english'))
and that should fix this issue.
More information on the NLTK documentation page: https://www.nltk.org/api/nltk.corpus.html?highlight=corpus#module-nltk.corpus