Search code examples
pythonpackagepython-moduleside-effectspurely-functional

Check Contents of Python Package without Running it?


I would like a function that, given a name which caused a NameError, can identify Python packages which could be imported to resolve it.

That part is fairly easy, and I've done it, but now I have an additional problem: I'd like to do it without causing side-effects. Here's the code I'm using right now:

def necessaryImportFor(name):
    from pkgutil import walk_packages
    for package in walk_packages():
        if package[1] == name:
            return name
        try:
            if hasattr(__import__(package[1]), name):
                return package[1]
        except Exception as e:
            print("Can't check " + package[1] + " on account of a " + e.__class__.__name__ + ": " + str(e))
    print("No possible import satisfies " + name)

The problem is that this code actually __import__s every module. This means that every side-effect of importing every module occurs. When testing my code I found that side-effects that can be caused by importing all modules include:

  • Launching tkinter applications
  • Requesting passwords with getpass
  • Requesting other input or raw_input
  • Printing messages (import this)
  • Opening websites (import antigravity)

A possible solution that I considered would be finding the path to every module (how? It seems to me that the only way to do this is by importing the module then using some methods from inspect on it), then parsing it to find every class, def, and = that isn't itself within a class or def, but that seems like a huge PITA and I don't think it would work for modules which are implemented in C/C++ instead of pure Python.

Another possibility is launching a child Python instance which has its output redirected to devnull and performing its checks there, killing it if it takes too long. That would solve the first four bullets, and the fifth one is such a special case that I could just skip antigravity. But having to start up thousands of instances of Python in this single function seems a bit... heavy and inefficient.

Does anyone have a better solution I haven't considered? Is there a simple way of just telling Python to generate an AST or something without actually importing a module, for example?


Solution

  • So I ended up writing a few methods which can list everything from a source file, without importing the source file.

    The ast module doesn't seem particularly well documented, so this was a bit of a PITA trying to figure out how to extract everything of interest. Still, after ~6 hours of trial and error today, I was able to get this together and run it on the 3000+ Python source files on my computer without any exceptions being raised.

    def listImportablesFromAST(ast_):
        from ast import (Assign, ClassDef, FunctionDef, Import, ImportFrom, Name,
                         For, Tuple, TryExcept, TryFinally, With)
    
        if isinstance(ast_, (ClassDef, FunctionDef)):
            return [ast_.name]
        elif isinstance(ast_, (Import, ImportFrom)):
            return [name.asname if name.asname else name.name for name in ast_.names]
    
        ret = []
    
        if isinstance(ast_, Assign):
            for target in ast_.targets:
                if isinstance(target, Tuple):
                    ret.extend([elt.id for elt in target.elts])
                elif isinstance(target, Name):
                    ret.append(target.id)
            return ret
    
        # These two attributes cover everything of interest from If, Module,
        # and While. They also cover parts of For, TryExcept, TryFinally, and With.
        if hasattr(ast_, 'body') and isinstance(ast_.body, list):
            for innerAST in ast_.body:
                ret.extend(listImportablesFromAST(innerAST))
        if hasattr(ast_, 'orelse'):
            for innerAST in ast_.orelse:
                ret.extend(listImportablesFromAST(innerAST))
    
        if isinstance(ast_, For):
            target = ast_.target
            if isinstance(target, Tuple):
                ret.extend([elt.id for elt in target.elts])
            else:
                ret.append(target.id)
        elif isinstance(ast_, TryExcept):
            for innerAST in ast_.handlers:
                ret.extend(listImportablesFromAST(innerAST))
        elif isinstance(ast_, TryFinally):
            for innerAST in ast_.finalbody:
                ret.extend(listImportablesFromAST(innerAST))
        elif isinstance(ast_, With):
            if ast_.optional_vars:
                ret.append(ast_.optional_vars.id)
        return ret
    
    def listImportablesFromSource(source, filename = '<Unknown>'):
        from ast import parse
        return listImportablesFromAST(parse(source, filename))
    
    def listImportablesFromSourceFile(filename):
        with open(filename) as f:
            source = f.read()
        return listImportablesFromSource(source, filename)
    

    The above code covers the titular question: How do I check the contents of a Python package without running it?

    But it leaves you with another question: How do I get the path to a Python package from just its name?

    Here's what I wrote to handle that:

    class PathToSourceFileException(Exception):
        pass
    
    class PackageMissingChildException(PathToSourceFileException):
        pass
    
    class PackageMissingInitException(PathToSourceFileException):
        pass
    
    class NotASourceFileException(PathToSourceFileException):
        pass
    
    def pathToSourceFile(name):
        '''
        Given a name, returns the path to the source file, if possible.
        Otherwise raises an ImportError or subclass of PathToSourceFileException.
        '''
    
        from os.path import dirname, isdir, isfile, join
    
        if '.' in name:
            parentSource = pathToSourceFile('.'.join(name.split('.')[:-1]))
            path = join(dirname(parentSource), name.split('.')[-1])
            if isdir(path):
                path = join(path, '__init__.py')
                if isfile(path):
                    return path
                raise PackageMissingInitException()
            path += '.py'
            if isfile(path):
                return path
            raise PackageMissingChildException()
    
        from imp import find_module, PKG_DIRECTORY, PY_SOURCE
    
        f, path, (suffix, mode, type_) = find_module(name)
        if f:
            f.close()
        if type_ == PY_SOURCE:
            return path
        elif type_ == PKG_DIRECTORY:
            path = join(path, '__init__.py')
            if isfile(path):
                return path
            raise PackageMissingInitException()
        raise NotASourceFileException('Name ' + name + ' refers to the file at path ' + path + ' which is not that of a source file.')
    

    Trying the two bits of code together, I have this function:

    def listImportablesFromName(name, allowImport = False):
        try:
            return listImportablesFromSourceFile(pathToSourceFile(name))
        except PathToSourceFileException:
            if not allowImport:
                raise
            return dir(__import__(name))
    

    Finally, here's the implementation for the function that I mentioned I wanted in my question:

    def necessaryImportFor(name):
        packageNames = []
    
        def nameHandler(name):
            packageNames.append(name)
    
        from pkgutil import walk_packages
        for package in walk_packages(onerror=nameHandler):
            nameHandler(package[1])
        # Suggestion: Sort package names by count of '.', so shallower packages are searched first.
        for package in packageNames:
            # Suggestion: just skip any package that starts with 'test.'
            try:
                if name in listImportablesForName(package):
                    return package
            except ImportError:
                pass
            except PathToSourceFileException:
                pass
        return None
    

    And that's how I spent my Sunday.