What does TruncatedSVD get_params([deep]) really do?

I don't understand the get_params([deep]) method available for TruncatedSVD in sklearn. Can some please explain it to me?

Solution

Check out the source of get_params here: https://github.com/scikit-learn/scikit-learn/blob/a24c8b46/sklearn/base.py#L213

Not just TruncatedSVD, basically all of the scikit-estimators contain this method because they all inherit this method from the BaseEstimator class.

Ans as the name says, it will give out the values of the parameters set in the class. In your case, check out the list of parameters here: http://scikit-learn.org/stable/modules/generated/sklearn.decomposition.TruncatedSVD.html

n_components : int, default = 2
algorithm : string, default = “randomized"
n_iter : int, optional (default 5)
random_state : int, RandomState instance or None, optional, default = None
tol : float, optional

Lets say you initialize the TruncatedSVD with the following code:

svd = TruncatedSVD(n_components=5, n_iter=7, random_state=42)

The output will be:

{'algorithm': 'randomized',
 'n_components': 5,
 'n_iter': 7,
 'random_state': 42,
 'tol': 0.0}

This is useful for making the clone of the object and is used extensively in various scikit learn utilities like cross_val_score, GridSearchCV, Pipeline etc.

If deep=True, it will just return the parameters of the inner estimators if any. For example take this code:

from sklearn import svm
from sklearn.feature_selection import SelectKBest
from sklearn.feature_selection import f_regression
from sklearn.pipeline import Pipeline
anova_filter = SelectKBest(f_regression, k=5)
clf = svm.SVC(kernel='linear')
anova_svm = Pipeline([('anova', anova_filter), ('svc', clf)])

The output of anova_svm.get_params(deep=False) is below:

{'memory': None,
 'steps': [('anova',
   SelectKBest(k=5, score_func=<function f_regression at 0x7fb34d50ede8>)),
  ('svc', SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0,
     decision_function_shape='ovr', degree=3, gamma='auto', kernel='linear',
     max_iter=-1, probability=False, random_state=None, shrinking=True,
     tol=0.001, verbose=False))]}

And below is the code of anova_svm.get_params(True):

{'anova': SelectKBest(k=5, score_func=<function f_regression at 0x7fb34d50ede8>),
 'anova__k': 5,
 'anova__score_func': <function sklearn.feature_selection.univariate_selection.f_regression>,
 'memory': None,
 'steps': [('anova',
   SelectKBest(k=5, score_func=<function f_regression at 0x7fb34d50ede8>)),
  ('svc', SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0,
     decision_function_shape='ovr', degree=3, gamma='auto', kernel='linear',
     max_iter=-1, probability=False, random_state=None, shrinking=True,
     tol=0.001, verbose=False))],
 'svc': SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0,
   decision_function_shape='ovr', degree=3, gamma='auto', kernel='linear',
   max_iter=-1, probability=False, random_state=None, shrinking=True,
   tol=0.001, verbose=False),
 'svc__C': 1.0,
 'svc__cache_size': 200,
 'svc__class_weight': None,
 'svc__coef0': 0.0,
 'svc__decision_function_shape': 'ovr',
 'svc__degree': 3,
 'svc__gamma': 'auto',
 'svc__kernel': 'linear',
 'svc__max_iter': -1,
 'svc__probability': False,
 'svc__random_state': None,
 'svc__shrinking': True,
 'svc__tol': 0.001,
 'svc__verbose': False}

You can see that now the output contains the values of parameters of svm and selectkbest which are internal estimators of pipeline estimator.