Search code examples
pythonpip

How do I determine which requirements are actually needed in setup.py?


I'm cleaning up packaging for a python project I didn't create. Currently, it does some explicitly unsupported magic to get its dependencies from a requirements.txt file. The file looks like it may have been generated by pip freeze; there are fixed versions for everything, and many apparently-extraneous packages listed. I am pretty sure some of these aren't real dependencies, but I don't know which ones.

Given just the source tree, how would I figure out, from scratch, what dependencies ought to be included in install_requires?

As a first stab, I'm grepping for non-stdlib import statements. I hope there's a better way.


Solution

  • There's no way to do this perfectly, because Python is too flexible.

    But it's usually possible to do it well enough.

    You can use start with the stdlib's modulefinder.

    Beyond that, a number of projects—mostly projects designed for building binary executables, installers, etc. for Python apps—have come up with heuristics that go even farther.

    These usually work. And, when they fail, you usually immediately spot it on your first test. Even if they aren't sufficient, they're at the very least good sample code. Here are a few off the top of my head:


    In case you're wondering why it's impossible:

    Even forgetting about the program of dependencies in C extension modules, Python is just too flexible to catch all the ways you could import a module via static analysis.

    Sure, you'd have to be dealing with code written by someone crazy enough to use explicitly unsupported magic for no good reason… but if you were, there's nothing to stop someone from writing this instead of import lxml:1

    with open('picture.jpg', encoding='cp500') as f:
        getattr(sys.modules[11], codecs.encode('vzcbeg_zbqhyr', 'rot13'))(f.read().strip())
    

    In reality, things aren't going to be that bad. But they could easily be too bad for rg import to be sufficient.

    You could try to detect all the imports dynamically with a simple import hook, but that's only guaranteed to work if you can exercise 100% of the code paths.


    1. Of course this only works if importlib was the 12th module loaded, and if picture.jpg is not a JPEG image but a textfile whose contents are, in EBCDIC, lxml\n