I am asking myself various questions about the fit method in sklearn.
Question 1: when I do:
from sklearn.decomposition import TruncatedSVD
model = TruncatedSVD()
svd_1 = model.fit(X1)
svd_2 = model.fit(X2)
Is the content of the variable model changing whatsoever during the process?
Question 2: when I do:
from sklearn.decomposition import TruncatedSVD
model = TruncatedSVD()
svd_1 = model.fit(X1)
svd_2 = svd_1.fit(X2)
What is happening to svd_1? In other words, svd_1 has already been fitted and I fit it again, so what is happenning to its component?
Question 1: Is the content of the variable model changing whatsoever during the process?
Yes. The fit
method modifies the object. And it returns a reference to the object. Thus, take care! In the first example all three variables model
, svd_1
, and svd_2
actually refer to the same object.
from sklearn.decomposition import TruncatedSVD
model = TruncatedSVD()
svd_1 = model.fit(X1)
svd_2 = model.fit(X2)
print(model is svd_1 is svd_2) # prints True
Question 2: What is happening to svd_1?
model
and svd_1
refer to the same object, so there is absolutely no difference between the first and the second example.
Final Remark:
What happens in both examples is that the result of fit(X1)
is overwritten by fit(X2)
, as pointed out in the answer by David Maust. If you want to have two different models fitted to two different sets of data you need to do something like this:
svd_1 = TruncatedSVD().fit(X1)
svd_2 = TruncatedSVD().fit(X2)