Search code examples
pythonpython-3.xpython-importlibimport-hooks

How to implement an import hook that can modify the source code on the fly using importlib?


Using the deprecated module imp, I can write a custom import hook that modifies the source code of a module on the fly, prior to importation/execution by Python. Given the source code as a string named source below, the essential code needed to create a module is the following:

module = imp.new_module(name)
sys.modules[name] = module
exec(source, module.__dict__)

Since imp is deprecated, I would like to do something similar with importlib. [EDIT: there are other imp methods that need to be replaced to build a custom import hook - so the answer I am looking for is not simply to replace the above code.]

However, I have not been able to figure out how to do this. The importlib documentation has a function to create modules from "specs" which, as far as I can tell, are objects that include their own loaders with no obvious way to redefine them so as to be able to create a module from a string.

I have created a minimal example to demonstrates this; see the readme file for details.


Solution

  • find_module and load_module are both deprecated. You'll need to switch to find_spec and (create_module and exec_module) module respectively. See the importlib documentation for details.

    You will also need to examine if you want to use a MetaPathFinder or a PathEntryFinder as the system to invoke them is different. That is, the meta path finder goes first and can override builtin modules, whereas the path entry finder works specifically for modules found on sys.path.

    The following is a very basic importer that attempts to replace the entire import machinery for. It shows how to use the functions (find_spec, create_module, and exec_module).

    import sys
    import os.path
    
    from importlib.abc import Loader, MetaPathFinder
    from importlib.util import spec_from_file_location
    
    class MyMetaFinder(MetaPathFinder):
        def find_spec(self, fullname, path, target=None):
            if path is None or path == "":
                path = [os.getcwd()] # top level import -- 
            if "." in fullname:
                *parents, name = fullname.split(".")
            else:
                name = fullname
            for entry in path:
                if os.path.isdir(os.path.join(entry, name)):
                    # this module has child modules
                    filename = os.path.join(entry, name, "__init__.py")
                    submodule_locations = [os.path.join(entry, name)]
                else:
                    filename = os.path.join(entry, name + ".py")
                    submodule_locations = None
                if not os.path.exists(filename):
                    continue
    
                return spec_from_file_location(fullname, filename, loader=MyLoader(filename),
                    submodule_search_locations=submodule_locations)
    
            return None # we don't know how to import this
    
    class MyLoader(Loader):
        def __init__(self, filename):
            self.filename = filename
    
        def create_module(self, spec):
            return None # use default module creation semantics
    
        def exec_module(self, module):
            with open(self.filename) as f:
                data = f.read()
    
            # manipulate data some way...
    
            exec(data, vars(module))
    
    def install():
        """Inserts the finder into the import machinery"""
        sys.meta_path.insert(0, MyMetaFinder())
    

    Next is a slightly more delicate version that attempts to reuse more of the import machinery. As such, you only need to define how to get the source of the module.

    import sys
    from os.path import isdir
    from importlib import invalidate_caches
    from importlib.abc import SourceLoader
    from importlib.machinery import FileFinder
    
    
    class MyLoader(SourceLoader):
        def __init__(self, fullname, path):
            self.fullname = fullname
            self.path = path
    
        def get_filename(self, fullname):
            return self.path
    
        def get_data(self, filename):
            """exec_module is already defined for us, we just have to provide a way
            of getting the source code of the module"""
            with open(filename) as f:
                data = f.read()
            # do something with data ...
            # eg. ignore it... return "print('hello world')"
            return data
    
    
    loader_details = MyLoader, [".py"]
    
    def install():
        # insert the path hook ahead of other path hooks
        sys.path_hooks.insert(0, FileFinder.path_hook(loader_details))
        # clear any loaders that might already be in use by the FileFinder
        sys.path_importer_cache.clear()
        invalidate_caches()