Search code examples
pythonpython-3.ximportpython-importpython-importlib

Python: Changing precedence of import file types (.py before .so)


If I do import A from within a directory containing both A.py and A.so, the .so file will be imported. I'm interested in changing the order of import file types, so that .py takes precedence over .so, though only temporarily, i.e. between code line i and j. Surely this can be achieved through some importlib magic?

Currently I get around the issue by copying the .py into a separate directory, prepending this directory to sys.path and then do the import, which is just awful.

Why the need?

The .so file(s) are cython-compiled versions of the .py files. I'm doing some custom code transformation on top of cython, for which I need to import the .py source even when the "equivalent" .so is present.

Test setup

Here follows a simple test setup.

# A.py
import B
# B.py
import C
print('hello from B')
# C.py
pass

Running python A.py successfully prints out the message from B.py. Now add B.so (as the content of the .so files is irrelevant, having B.so really be a text file is fine):

# B.so
this is a fake binary

Now python A.py fails. Though importlib is the modern way of doing things, I so far only know how to import a specific file directly using the deprecated imp module. Updating A.py to

# A.py
import imp
B = imp.load_source('B', 'B.py')

makes it work again. However, introducing C.so breaks it again, as the lookup for the .py rather than .so is not registered globally in the import mechanism:

# C.so
this is a fake binary

Note that in this example I'm only allowed to edit A.py. I'm in need of a solution for Python 3.8, but I suspect any solution for 3.x works on 3.8 as well.


Solution

  • I now have a working solution. It's somewhat hacky, but I think it's robust.

    It turns out that sys.path_importer_cache store various finders which in turn store a list of loaders, which are quarried by import in order. These loaders are stored as 2-tuples, with the first element exactly being the file extension which the given loader handles.

    I simply traverse all list's of loaders and push those with .so extension to the back of the list's, achieving the lowest precedence possible (I could remove them completely, but then I can't import any .so files). I keep track of the changes to sys.path_importer_cache and undo them once I'm done with my special import. All of this is neatly wrapped up in a context manager:

    import collections, contextlib, sys
    
    @contextlib.contextmanager
    def disable_loader(ext):
        ext = '.' + ext.lstrip('.')
        # Push any loaders for the ext extension to the back
        edits = collections.defaultdict(list)
        path_importer_cache = list(sys.path_importer_cache.values())
        for i, finder in enumerate(path_importer_cache):
            loaders = getattr(finder, '_loaders', None)
            if loaders is None:
                continue
            for j, loader in enumerate(loaders):
                if j + len(edits[i]) == len(loaders):
                    break
                if loader[0] != ext:
                    continue
                # Loader for the ext extension found.
                # Push to the back.
                loaders.append(loaders.pop(j))
                edits[i].append(j)
        try:
            # Yield control back to the caller
            yield
        finally:
            # Undo changes to path importer cache
            for i, edit in edits.items():
                loaders = path_importer_cache[i]._loaders
                for j in reversed(edit):
                    loaders.insert(j, loaders.pop())
    
    # Demonstrate import failure
    try:
        import A
    except Exception as e:
        print(e)
    
    # Demonstrate solution
    with disable_loader('.so'):
        import A
    
    # Demonstrate (wanted) failure outside with statement
    import A2
    

    Note that for the import A2 to fail properly, you need to copy the test setup so that you also have A2.py, B2.py, C2.py, B2.so and C2.so, which import each other in the same way as the original test files.

    One can get rid of the somewhat complicated bookkeeping involving edits by just taking a complete backup copy.deepcopy(sys.path_importer_cache) before making the changes, and sticking this backup onto sys once done. It does work in the limited test above, but as various parts of the import machinery might hold references to the different nested objects, I thought it safer to use mutation only.