Search code examples
pythondata-scienceartificial-intelligencebert-language-modeltopic-modeling

BerTopic Model - Visualization ignores 0th index


The BerTopic model resulted the below Topics:

enter image description here

As you can see from the above, the model is finetuned to generate lesser outliers '-1' which has the count of 3 and it appears in the last.

While visualizing the Topics per class,

topic_model.visualize_topics_per_class(topics_per_class)

the below interactive visual is generated, and however it ignored the 0th index, to be precise the Topic 0. The Global Topic Representations are displayed from 1, 2, 3, 4, 5, 6, -1

enter image description here

Is the BerTopic designed in a way that it always assumes the very first index will be an outlier (-1), and eliminates it blindly?

Are the generated topics always accessed based on the count size, may be in descending order?


Solution

  • This issue is posted in the BerTopic github forum as well, and the response from the Author himself,

    enter image description here

    by setting top_n_topics=None, all the topics along with the 0th index can be viewed while visualizing,

    topic_model.visualize_topics_per_class(topics_per_class, top_n_topics=None)