Search code examples
scikit-learnsvdsklearn-pandas

What does TruncatedSVD get_params([deep]) really do?


I don't understand the get_params([deep]) method available for TruncatedSVD in sklearn. Can some please explain it to me?


Solution

  • Check out the source of get_params here: https://github.com/scikit-learn/scikit-learn/blob/a24c8b46/sklearn/base.py#L213

    Not just TruncatedSVD, basically all of the scikit-estimators contain this method because they all inherit this method from the BaseEstimator class.

    Ans as the name says, it will give out the values of the parameters set in the class. In your case, check out the list of parameters here: http://scikit-learn.org/stable/modules/generated/sklearn.decomposition.TruncatedSVD.html

    n_components : int, default = 2
    algorithm : string, default = “randomized"
    n_iter : int, optional (default 5)
    random_state : int, RandomState instance or None, optional, default = None
    tol : float, optional
    

    Lets say you initialize the TruncatedSVD with the following code:

    svd = TruncatedSVD(n_components=5, n_iter=7, random_state=42)
    

    The output will be:

    {'algorithm': 'randomized',
     'n_components': 5,
     'n_iter': 7,
     'random_state': 42,
     'tol': 0.0}
    

    This is useful for making the clone of the object and is used extensively in various scikit learn utilities like cross_val_score, GridSearchCV, Pipeline etc.

    If deep=True, it will just return the parameters of the inner estimators if any. For example take this code:

    from sklearn import svm
    from sklearn.feature_selection import SelectKBest
    from sklearn.feature_selection import f_regression
    from sklearn.pipeline import Pipeline
    anova_filter = SelectKBest(f_regression, k=5)
    clf = svm.SVC(kernel='linear')
    anova_svm = Pipeline([('anova', anova_filter), ('svc', clf)])
    

    The output of anova_svm.get_params(deep=False) is below:

    {'memory': None,
     'steps': [('anova',
       SelectKBest(k=5, score_func=<function f_regression at 0x7fb34d50ede8>)),
      ('svc', SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0,
         decision_function_shape='ovr', degree=3, gamma='auto', kernel='linear',
         max_iter=-1, probability=False, random_state=None, shrinking=True,
         tol=0.001, verbose=False))]}
    

    And below is the code of anova_svm.get_params(True):

    {'anova': SelectKBest(k=5, score_func=<function f_regression at 0x7fb34d50ede8>),
     'anova__k': 5,
     'anova__score_func': <function sklearn.feature_selection.univariate_selection.f_regression>,
     'memory': None,
     'steps': [('anova',
       SelectKBest(k=5, score_func=<function f_regression at 0x7fb34d50ede8>)),
      ('svc', SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0,
         decision_function_shape='ovr', degree=3, gamma='auto', kernel='linear',
         max_iter=-1, probability=False, random_state=None, shrinking=True,
         tol=0.001, verbose=False))],
     'svc': SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0,
       decision_function_shape='ovr', degree=3, gamma='auto', kernel='linear',
       max_iter=-1, probability=False, random_state=None, shrinking=True,
       tol=0.001, verbose=False),
     'svc__C': 1.0,
     'svc__cache_size': 200,
     'svc__class_weight': None,
     'svc__coef0': 0.0,
     'svc__decision_function_shape': 'ovr',
     'svc__degree': 3,
     'svc__gamma': 'auto',
     'svc__kernel': 'linear',
     'svc__max_iter': -1,
     'svc__probability': False,
     'svc__random_state': None,
     'svc__shrinking': True,
     'svc__tol': 0.001,
     'svc__verbose': False}
    

    You can see that now the output contains the values of parameters of svm and selectkbest which are internal estimators of pipeline estimator.