problem with input features for latent dirichlet allocation

I am trying to make predicitions with my LDA model. But when i pass a string to it it gives an error about mismatching input features. Now my question is how can i make my model accept any input and still predict the right topic. Right now it takes 54777 as input.

model:

cv = CountVectorizer(max_df=0.95, min_df=2, stop_words='english')
dtm = cv.fit_transform(npr['Article'])
LDA = LatentDirichletAllocation(n_components=7,random_state=42)
LDA.fit(dtm)

prediction

txt = ["The election of Donald Trump was a surprise to pollsters, pundits and, perhaps most of all, the Democratic Party."]
vectorizer = CountVectorizer()
txt_vectorized = vectorizer.fit_transform(txt)
predict = LDA.transform(txt_vectorized)
print(predict)

error:

ValueError: X has 16 features, but LatentDirichletAllocation is expecting 54777 features as input.

Solution

There are three issues with this code snippet.

Issue-1: max_df and min_df should be both int or both float.
Issue-2: At the prediction time you have to use the same CountVectorizer.
Issue-3: At the prediction time you have to use the transform method, not the fit_transform method of CountVectorizer.

Here is an example code that will help you:

from sklearn.feature_extraction.text import CountVectorizer
corpus = [
    'This is the first document.',
    'This document is the second document.',
    'And this is the third one.',
    'Is this the first document?',
]
cv = CountVectorizer()

Train the model:

from sklearn.decomposition import LatentDirichletAllocation

dtm = cv.fit_transform(corpus)
LDA = LatentDirichletAllocation(n_components=7,random_state=42)
LDA.fit(dtm)

Prediction:

txt = ["This is a new document"]
txt_vectorized = cv.transform(txt)
predict = LDA.transform(txt_vectorized)
print(predict)