Search code examples
pythonsetuptoolspython-packaging

Alias Python package with a different root name when the old one uses lazy imports


I'm currently in the middle of renaming a project. In the meantime I need to not break existing users, so I'd like to provide an alias that lets users use it exactly the same as they previously would. This is turning out to be tricker than I initially thought.

Say I have a repo with a package:

├── setup.py
├── pyproject.toml
└── oldpkg
    ├── __init__.py
    ├── __main__.py
    ├── submod.py
    └── subpkg
        ├── __init__.py
        └── nested.py

My first thought was to introduce:

├── newpkg.py

and fill its contents with from oldpkg import *

Then in setup.py add:

        packages=find_packages(include=[
            'oldpkg', 'oldpkg.*',
            # Alias the module while we transition to a new name.
            'newpkg', 'newpkg.*',
        ]),

so both packages are installed.

Works at a the top level for import newpkg but fails if you from newpkg import subpkg.

My next thought was that I could write a script that for every python file in the old package I autogenerate a dumy python file in the new package with from <oldpkg>.<subname> import *.

Now this functionally works, but it introduces two namespaces and it has the problem that it actually accesses the attributes in the old package. The old package uses lazy imports, which means it uses the module level __getattr__ to expose submodules dynamically without doing the time consuming work of importing them all at startup.

I'm not sure if there is any way around this problem. Really all I want is the user to be able to use oldpkg and newpkg interchangably. In fact it would be really great if import oldpkg, newpkg; oldpkg is newpkg was True.

Is there any way that I can write a newpkg that is a strict alias of oldpkg? If possible I'd like the following statements to be functionally equivalent:

# In bash I'd like the CLI to be the same. The `__main__.py` should be mirrored
python -m oldpkg
python -m newpkg
# In Python
import oldpkg
import newpkg

assert oldpkg is newpkg

from oldpkg.subpkg import nested as n1
from newpkg.subpkg import nested as n2

assert n1 is n2

Perhaps the above is not possible, but I'm wondering what the best way to go about this would be. I want to go through the following transition phases:

  1. newpkg is just a pointer to oldpkg.
  2. contents move from oldpkg to newpkg and now oldpkg is a pointer to newpkg.
  3. oldpkg now includes a deprecation warning.
  4. oldpkg now errors when it is imported and tells the user to use newpkg.
  5. oldpkg is removed.

Is there any prior art on acomplishing this transition plan?


Solution

  • I've come up with a script that autogenerates the files necessary to handle most cases. It still has issues with python -m, but I think I can figure out a way to make it work. I will update the answer when I finish that.

    The following script makes a another module with the same exact files as yours and adds a __getattr__ so all of those module attributes refer back ot the original module.

    import networkx as nx
    import ubelt as ub
    
    
    new_name = 'NEW_MODULE'
    module = ub.import_module_from_name('OLD_MODULE')
    module_dpath = ub.Path(module.__file__).parent
    repo_dpath = module_dpath.parent
    
    g = nx.DiGraph()
    g.add_node(module_dpath, label=module_dpath.name, type='dir')
    
    for root, dnames, fnames in module_dpath.walk():
        # dnames[:] = [d for d in dnames if not dname_block_pattern.match(d)]
        if '__init__.py' not in fnames:
            dnames.clear()
            continue
    
        g.add_node(root, label=root.name, type='dir')
        if root != module_dpath:
            g.add_edge(root.parent, root)
    
        # for d in dnames:
        #     dpath = root / d
        #     g.add_node(dpath, label=dpath.name)
        #     g.add_edge(root, dpath)
    
        for f in fnames:
            if f.endswith('.py'):
                fpath = root / f
                g.add_node(fpath, label=fpath.name, type='file')
                g.add_edge(root, fpath)
    
    for p in list(g.nodes):
        node_data = g.nodes[p]
        ntype = node_data.get('type', None)
        if ntype == 'dir':
            node_data['label'] = ub.color_text(node_data['label'], 'blue')
        elif ntype == 'file':
            node_data['label'] = ub.color_text(node_data['label'], 'green')
    
    nx.write_network_text(g)
    
    for node, node_data in g.nodes(data=True):
        if node_data['type'] == 'file':
            relpath = node.relative_to(module_dpath)
            new_fpath = repo_dpath / new_name / relpath
            new_fpath.parent.ensuredir()
            modname = ub.modpath_to_modname(node)
            print(f'new_fpath={new_fpath}')
            if new_fpath.name == '__main__.py':
                new_fpath.write_text(ub.codeblock(
                    f'''
                    from {modname} import *  # NOQA
                    '''))
            else:
                new_fpath.write_text(ub.codeblock(
                    f'''
                    # Autogenerated via:
                    # python dev/maintain/mirror_package.py
                    def __getattr__(key):
                        import {modname} as mirror
                        return getattr(mirror, key)
                    '''))
    

    It's annoying that this takes so much boilerplate, but it works, and it is maintainable via a script.