Search code examples
numpyrandomscikit-learnsklearn-pandas

sklearn random_state is not working properly


I read everything related to this but still did not understand what the problem is really. Basically I use TruncatedSVD with random_state and then print explained_variance_ratio_.sum() for it. It changes every time I run the code. Is this normal?

from sklearn.decomposition import TruncatedSVD
SVD = TruncatedSVD(n_components=40, n_iter=7, random_state=42)

XSVD = SVD.fit_transform(X)
print(SVD.explained_variance_ratio_.sum())

The problem is later I use umap and plot the result graph. And I have different graphs everytime I run the code. I do not understand if this is due to TruncatedSVD or UMAP. I use random_state=42 to stop things to change but it looks like there is no effect really.


Solution

  • You should probably do something wrong, because I cannot reproduce your issue with scikit-learn 0.22

    In [16]: import numpy as np 
        ...: from sklearn.decomposition import TruncatedSVD 
        ...:  
        ...: rng = np.random.RandomState(42) 
        ...: X = rng.randn(10000, 100) 
        ...: def func(X): 
        ...:     SVD = TruncatedSVD(n_components=40, n_iter=7, random_state=42) 
        ...:     XSVD = SVD.fit_transform(X) 
        ...:     print(SVD.explained_variance_ratio_.sum()) 
        ...: func(X);func(X);func(X);                                                                                                                                           
    0.43320350603512425
    0.43320350603512425
    0.43320350603512425