How to fix random seed for BERTopic?

I'd like to fix the random seed from BERTopic library to get reproducible results. Looking at the code of BERTopic I see it uses numpy. Will using np.random.seed(123) be enough? or do I also need to other libraries as random or pytorch as in this question.

Solution

You can fix the random_state variable using UMAP, but you have to also send the other default parameters to the UMAP constructor or the model will break.

What this looks like in practice is:

umap = UMAP(n_neighbors=15,
            n_components=5,
            min_dist=0.0,
            metric='cosine',
            low_memory=False,
            random_state=1337) 
model = BERTopic(language="multilingual", umap_model=umap)
topics, probs = model.fit_transform(content)

By default, umap_model is set to None in the BERTopic constructor. Internally if that is not provided, it sets one up with default params here in the code.

Note that low_memory is a param in both constructors, and if the BERTopic constructor isn't called with that in it, it internally sets it to False.