Search code examples
pythondeploymentpipbinaries

What are the reasons for not hosting a compiler on a live server?


Where I currently work we've had a small debate about deploying our Python code to the production servers. I voted to build binary dependencies (like the python mysql drivers) on the server itself, just using pip install -r requirements.txt. This was quickly vetoed with no better explanation that "we don't put compilers on the live servers". As a result our deployment process is becoming convoluted and over-engineered simply to avoid this compilation step.

My question is this: What's the reason these days to avoid having a compiler on live servers?


Solution

  • In general, the prevailing wisdom on servers installs is that they should be as stripped-down as possible. There are a few motivations for this, but they don't really apply all that directly to your question about a compiler:

    • Minimize resource usage. GCC might take up a little extra disk space, but probably not enough to matter - and it won't be running most of the time, so CPU/memory usage isn't a big concern.
    • Minimize complexity. Building on your server might add a few more failure modes to your build process (if you build elsewhere, then at least you will notice something wrong before you go mess with your production server), but otherwise, it won't get in the way.
    • Minimize attack surface. As others have pointed out, by the time an attacker can make use of a compiler, you're probably already screwed..

    At my company, we generally don't care too much if compilers are installed on our servers, but we also never run pip on our servers, for a rather different reason. We're not so concerned about where packages are built, but when and how they are downloaded.

    The particularly paranoid among us will take note that pip (and easy_install) will happily install packages from PYPI without any form of authentication (no SSL, no package signatures, ...). Further, many of these aren't actually hosted on PYPI; pip and easy_install follow redirects. So, there are two problems here:

    • If pypi - or any of the other sites on which your dependencies are hosted - goes down, then your build process will fail
    • If an attacker somehow manages to perform a man-in-the-middle attack against your server as it's attempting to download a dependency package, then he'll be able to insert malicious code into the download

    So, we download packages when we first add a dependency, do our best to make sure the source is genuine (this is not foolproof), and add them into our own version-control system. We do actually build our packages on a separate build server, but this is less crucial; we simply find it useful to have a binary package we can quickly deploy to multiple instances.