Search code examples
pythonnlpgensimlda

Gensim LDA : error cannot compute LDA over an empty collection (no terms)


I have te same error as this thread : ValueError: cannot compute LDA over an empty collection (no terms) but the solution needed isn't the same.

I'm working on a notebook with Sklearn, and I've done an LDA and a NMF.

I'm now trying to do the same using Gensim: https://radimrehurek.com/gensim/auto_examples/tutorials/run_lda.htm

Here is a piece of code (in Python) from my notebook of what I'm trying to do :

dic = gensim.corpora.Dictionary(texts_lem)
dic.filter_extremes(no_below=10, no_above=0.8)
corpus = [dic.doc2bow(doc) for doc in texts_lem]

model = gensim.models.LdaModel(
    corpus=corpus,
    id2word=dic.id2token,
    num_topics=10,
)

I'm using the existing texts_lem list from another section of my notebook to do the Gensim LDA. I'm following the guide : Creating a dictionary, filtering extremes, creating a corpus and sending it to LdaModel().

Unfortunately, it doesn't work, and commenting filter_extremes's row doesn't help (This is the answer of the other thread with same error).

texts_lem is the list of list of words like the following :

[
 ['word', 'word', 'word', 'word'],
 ['word', 'word', 'word', 'word'],
 ['word', 'word', 'word', 'word'],
]

My error is :

ValueError: cannot compute LDA over an empty collection (no terms)

Many thanks for your help.


Solution

  • Just don't use id2token.

    Your model should be :

    model = gensim.models.LdaModel(
    corpus=corpus,
    id2word=dic.id2token,
    num_topics=10,
    )
    

    Works fine. Who knows what's going on ?