Search code examples
pythonpiprequirements.txt

Use >= or ~= for compatibilty across systems?


My goal is a simple and proper way to export my venv. In the optimal case, the resulting requirements.txt works on all compatible systems.

At the moment I use pip freeze > requirements.txt. This uses the == "Version matching clause". On an other system the file might not work due to conflicting versions, although it was compatible.

In PEP 440 there is also a ~= "Compatible clause". However, I cannot find an option for that in pip freeze docs. Using "find and replace" or a tool like awk to replace == with ~= works okay.

My naive conclusion is that ~= would be the ideal clause to use in requirements.txt. However, when I look at popular packages they often use >= to specify a version. E.g. at urllib3.

Is there a drawback to ~=, which I do not see?
If that is not the case: Why is >= used in so many packages?

Edit:
Pigar has an option to use >= natively and there is a comparison to freeze here. Apparently, they also do not use ~=.
Yet, I am still not sure which one to use, as >= could break when there is a major version change. Also packages which are a lower minor version would be marked incompatible, although they should be compatible.


Solution

  • Your question is not simple to answer and touches on some nuances in the social dynamics around versioning.

    Easy stuff first: sometimes versions use a terminal suffix to indicate something like prerelease builds, and if you're dependent on a prerelease build or some other situation where you expect the terminal suffix to iterate repeatedly (especially in a non-ordered way), ~= helps you by letting you accept all iterations on a build. PEP 440 contains a good example:

    ~= 2.2.post3
    >= 2.2.post3, == 2.*
    

    Second, pip freeze is not meant to be used to generate a requirements list. It just dumps a list of everything you've currently got, which is far more than actually needs to go in a requirements file. So it makes sense that it would only use ==: for example, it's meant to let you replicate a set of packages to an 'identical' environment elsewhere.


    Hard stuff next. Under semantic versioning, the only backwards-incompatible revisions should be major revisions. (This depends on how much you trust the maintainer - put a pin in that.) However, if specifying a patch number, ~= won't upgrade to a new minor rev even if one is available and it should, in principle, be backwards-compatible. This is important to talk about clearly, because "compatible release" has two different meanings: in semantic versioning, a "compatible release" is (colloquially) any rev between this one and the next major rev; in requirements files, a "compatible release" is a revision that patches the same terminal rev only.

    Let me be clear now: when I say "backwards-compatible," I mean it in the semantic versioning sense only. (If the package in question doesn’t use semantic versioning, or has a fourth version number, well - generally ~= will still match all patches, but check to be sure.)

    So, there's a trade to be made between >= and ~=, and it has to do with chains of trust in dependency management. Here are three principles - then after, I'll offer some speculation on why so many package maintainers use >=.

    1. In general, it's the responsibility of a package maintainer to ensure that all version numbers matching their requirements.txt are compatible with that package, with the occasional exception of deprecated patch revs. This includes ensuring that the requirements.txt is as small as possible and contains only that package's requirements. (More broadly, “require as little as possible and validate it as much as possible.”)

    2. In general, no matter the language and no matter the package, dependencies reflect a chain of trust. I am implementing a package; I trust you to maintain your package (and its requirements file) in a way that continues to function. You are trusting your dependencies to maintain their packages in a way that continues to function. In turn, your downstream consumers are expecting you to maintain your package in a way that means it continues to function for them. This is based on human trust. The number is 'just' a convenient communication tool.

    3. In general, no matter the change set, package maintainers try extremely hard to avoid major versions. No one wants to be the guy who releases a major rev and forces consumers to version their package through a substantial rewrite - or consign their projects to an old and unsupported version. We accept major revs as necessary (that's why we have systems to track them), but folks are typically loath to use them until they really don't have another option.

    Synthesize these three. From the perspective of a package maintainer, supposing one trusts the maintainers one is dependent upon (as one should), it is broadly speaking more reasonable to expect major revisions to be rare, than it is to expect minor revisions to be backwards-incompatible by accident. This means the number of reactive updates you'll need to make in the >= scheme should be small (but, of course, nonzero).


    That's a lot of groundwork. I know this is long, but this is the good part: the trade.

    For example, suppose I developed a package, helloworld == 0.7.10. You developed a package atop helloworld == 0.7.10, and then I later rev helloworld to 0.8. Let's start by considering the best case situation: that I am still offering support for the 0.7.10 version and (ex.) patch it to 0.7.11 at a later date, even while maintaining 0.8 separately. This allows your downstream consumers to accept patches without losing compatibility with your package, even when using ~=. And, you are "guaranteed" that future patches won't break your current implementation or require maintenance in event of mistakes - I’m doing that work for you. Of course, this only works if I go to the trouble of maintaining both 0.7 and 0.8, but this does seem advantageous...

    So, why does it break? Well, one example. What happens if you specify helloworld ~= 0.7.10 in your package, but another upstream dependency of yours (that isn't me!) upgrades, and now uses helloworld >= 0.8.1? Since you relied on a minor version's compatibility requirements, there's now a conflict. Worse, what if a consumer of your package wants to use new features from helloworld == 0.8.1 that aren't available in 0.7? They can't.

    But remember, a semver-compliant package built on helloworld v0.7 should be just fine running on helloworld v0.8 - there should be no breaking changes. It's your specification of ~= that is the most likely to have broken a dependency or consumer need for no good reason - not helloworld.

    If instead you had used helloworld >= 0.7.10, then you would've allowed for the installation of 0.8, even when your package was not explicitly written using it. If 0.8 doesn't break your implementation, which is supposed to be true, then allowing its use would be the correct manual decision anyway. You don't even necessarily need to know what I'm doing or how I'm writing 0.8, because minor versions should only be adding functionality - functionality you're obviously not using, but someone else might want to.

    The chain of trust is leaky, though. As the maintainer of helloworld, I might not know for certain whether my revision 0.8 introduces bugs or potential issues that could interfere with the usage of a package originally written for 0.7. Sure, by naming it 0.8 and not 1.0, I claim that I will (and should be expected to!) provide patches to helloworld as needed to address failures to maintain backwards-compatibility. But in practice, that might become untenable, or simply not happen, especially in the very unusual case (joke) where a package does not have rigorous unit and regression tests.

    So your trade, as a package developer and maintainer, boils down to this: Do you trust me, the maintainer of helloworld, to infrequently release major revs, and to ensure that minor revs do not risk breaking backwards-compatibility, more than you need your downstream consumers to be guaranteed a stable release?


    Using >= means:

    • (Rare): If I release a major rev, you'll need to update your requirements file to specify which major rev you are referring to.
    • (Uncommon): If I release a minor rev, but a bug, review, regression failure, etc. cause that minor rev to break packages built atop old versions, you'll either need to update your requirements file to specify which minor rev you are referring to, or wait for me to patch it further. (What if I decline to patch it further, or worse, take my sweet time doing so?)

    Using ~= means:

    • If any of your upstream packages end up using a different minor revision than the one your package was originally built to use, you risk a dependency conflict between you and your upstream providers.
    • If any of your downstream consumers want or need to use features introduced in a later minor revision of a package you depend upon, they can't - not without overriding your requirements file and hoping for the best.
    • If I stop supporting a minor revision of a package you use, and release critical patches on a future minor rev only, you and your consumers won't get them. (What if these are important, ex. security updates? urllib3 could be a great example.)

    If those 'rare' or 'uncommon' events are so disruptive to your project that you just can't conceive of a world in which you'd want to take that risk, use ~=, even at the cost of convenience/security to your downstream consumers. But if you want to give downstream consumers the most flexibility possible, don't mind dealing with the occasional breaking-change event, and want to make sure your own code typically runs on the most recent version it can, using >= is the safer way to go. It's usually the right decision, anyway.

    For this reason, I expect most maintainers deliberately use >= most of the time. Or maybe it's force of habit. Or maybe I'm just reading too much into it.