Search code examples
pythonmachine-learningscikit-learn

fit method in sklearn


I am asking myself various questions about the fit method in sklearn.

Question 1: when I do:

from sklearn.decomposition import TruncatedSVD
model = TruncatedSVD()
svd_1 = model.fit(X1)
svd_2 = model.fit(X2)

Is the content of the variable model changing whatsoever during the process?

Question 2: when I do:

from sklearn.decomposition import TruncatedSVD
model = TruncatedSVD()
svd_1 = model.fit(X1)
svd_2 = svd_1.fit(X2)

What is happening to svd_1? In other words, svd_1 has already been fitted and I fit it again, so what is happenning to its component?


Solution

  • Question 1: Is the content of the variable model changing whatsoever during the process?

    Yes. The fit method modifies the object. And it returns a reference to the object. Thus, take care! In the first example all three variables model, svd_1, and svd_2 actually refer to the same object.

    from sklearn.decomposition import TruncatedSVD
    model = TruncatedSVD()
    svd_1 = model.fit(X1)
    svd_2 = model.fit(X2)
    print(model is svd_1 is svd_2)  # prints True
    

    Question 2: What is happening to svd_1?

    model and svd_1 refer to the same object, so there is absolutely no difference between the first and the second example.

    Final Remark: What happens in both examples is that the result of fit(X1) is overwritten by fit(X2), as pointed out in the answer by David Maust. If you want to have two different models fitted to two different sets of data you need to do something like this:

    svd_1 = TruncatedSVD().fit(X1)
    svd_2 = TruncatedSVD().fit(X2)