I'd like to fix the random seed from BERTopic library to get reproducible results. Looking at the code of BERTopic I see it uses numpy. Will using np.random.seed(123)
be enough? or do I also need to other libraries as random or pytorch as in this question.
You can fix the random_state
variable using UMAP, but you have to also send the other default parameters to the UMAP constructor or the model will break.
What this looks like in practice is:
umap = UMAP(n_neighbors=15,
n_components=5,
min_dist=0.0,
metric='cosine',
low_memory=False,
random_state=1337)
model = BERTopic(language="multilingual", umap_model=umap)
topics, probs = model.fit_transform(content)
By default, umap_model
is set to None
in the BERTopic
constructor. Internally if that is not provided, it sets one up with default params here in the code.
Note that low_memory
is a param in both constructors, and if the BERTopic
constructor isn't called with that in it, it internally sets it to False
.