Search code examples
pythonscikit-learnpython-import

Understanding scikit learn import variants


Scikit learn import statements in their tutorials are on the form

from sklearn.decomposition import PCA

Another versions that works is

import sklearn.decomposition
pca = sklearn.decomposition.PCA(n_components = 2)

However

import sklearn
pca = sklearn.decomposition.PCA(n_components = 2)

does not, and complains

AttributeError: module 'sklearn' has no attribute 'decomposition'

Why is this, and how can I predict which ones will work and not so i don't have to test around? If the understanding and predictiveness extends to python packages in general that would be the best.


Solution

  • sklearn doesn't automatically import its submodules. If you want to use sklearn.<SUBMODULE>, then you will need to import it explicitly e.g. import sklearn.<SUBMODULE>. Then you can use it without any further imports like result = sklearn.<SUBMODULE>.function(...).

    Large packages often behave this way where they don't automatically import all the submodules.

    Memory and load-time efficiency become worse if the submodules are automatically loaded; by specifying the submodule explicitly it saves on memory consumption and minimises the start-up time. I think namespace cluttering is another consideration, where explicit imports reduce the chance of naming conflicts and help maintain clarity about the specific functionality being used.