Search code examples
pythondockerpipenvpython-venv

Why should I use PIPENV_VENV_IN_PROJECT in docker?


I am trying to understand why people use PIPENV_VENV_IN_PROJECT in a dockerized application.

From Pipenv documentation:

You might want to set export PIPENV_VENV_IN_PROJECT=1 in your .bashrc/.zshrc (or any shell 
configuration file) for creating the virtualenv inside your project’s directory, avoiding 
problems with subsequent path changes.
How is that relevant when I am using my application dockerized?

As far as I understand, I can install my pipenv dependencies with the --system flag, since I am already in a virtual environment: docker!


Solution

  • The best short answers for this are here and here.

    The basics of it is portability of the app itself since it's isolated from the OS. Docker only isolates the OS from the host OS. Venvs isolate Python. Not only from other apps, but from the OS itself. You can literally move that venv from one OS to another without ever affecting the app itself (almost). No need to change app dependencies due to a change in OS. The app will run the same on Ubuntu as it does on Alpine as it does on Windows (if the app is truly OS independent). Why risk breaking the app going from Ubuntu 18.04 to 20.04 or worse, 180.04 to 21.04, which all ship with a completely different set of Python libraries?

    As the above link shows, on Ubuntu, which ships with Python by default as a dependency of apt, you can potentially break apt or the app itself with an update of the OS or vice versa. Then you have the pip vs pip3 vs python vs python3 CLI command differences. PITA. Just negate ANY question and anomaly by using venvs.

    UPDATE: I mispoke. The Ubuntu docker images don't ship with Python pre-installed. So no system/apt dependency on Python. But the vm's/cloud images do. And without adding Deadsnakes, you're not getting anything higher than 3.8.I know entire dev/IT teams that flat out refuse to use PPA's, and for good reason.

    Heck, there are projects out there that use docker itself as a development environment and don't rely on the developer's own machine. The Cookiecutter template for Django for example.

    Best practice? Use a venv. Just because it's a docker image doesn't mean best-practices shouldn't be followed. It's a best practice for a reason. Try it. Install the app on system Python on Ubuntu 18.04. Then at the top of your Dockerfile change 18.04 to 21.04, or better, Alpine... and watch your app break without ever touching its code. You'll be extremely lucky if it doesn't. Your app will undoubtedly have a different dependency tree than the OS does. I bring up Ubuntu, because notably, that tends to require one specific version of Python to run. The very reason why venvs exist in the first place. Ubuntu in a docker image is no different than Ubuntu on your dev machine. The same Python dependencies the OS has exist there as they do in that docker image.

    The ONLY place I could see a venv not being worth it to use is when utilizing the Google "distroless" images which only use libraries the app itself needs. No package managers, nothing. Just the core OS files needed to run that Python runtime. IMO, the distroless images nowadays aren't really needed. Between Alpine and Debian-slims, they are already small and extremely portable from one host OS to another. No need to deal with the huge caveat and headache of a distroless image that lacks shell access making debugging a PITA.

    In addition, have a look at The Twelve Factors. IMO, the gold standard for app development.

    A twelve-factor app never relies on implicit existence of system-wide packages. It declares all dependencies, completely and exactly, via a dependency declaration manifest. Furthermore, it uses a dependency isolation tool during execution to ensure that no implicit dependencies “leak in” from the surrounding system. The full and explicit dependency specification is applied uniformly to both production and development.

    This quote is very telling:

    Twelve-factor apps also do not rely on the implicit existence of any system tools. Examples include shelling out to ImageMagick or curl. While these tools may exist on many or even most systems, there is no guarantee that they will exist on all systems where the app may run in the future, or whether the version found on a future system will be compatible with the app. If the app needs to shell out to a system tool, that tool should be vendored into the app.

    I'm inclined to take the advice of a conglomerate of devs who have worked on hundreds of apps throughout their careers. Total freedom to develop my app is what I want. I do not want to be at the mercy of whatever the distro devs felt was best for their OS. pipenv install xxxx or poetry install xxxx will only resolve library conflicts from within my app. It won't tell me that changing xxxx inside of system Python will break something in the docker OS (or vice versa). It's ignorant to whatever I have in that docker OS.

    I think the better question here is why the devs have that PIPENV_VENV_IN_PROJECT/--system flag. I am sure there's a reason why it exists. Probably some isolated use case that I can't think of. But there's more reasons to not use it than to use it.

    EDIT: Found the use case from the pipenv docs: Deploying System Dependencies

    You can tell Pipenv to install a Pipfile’s contents into its parent system with the --system flag:

    This is useful for managing the system Python, and deployment infrastructure (e.g. Heroku does this).

    Notice how that says "for managing the system Python, and deployment infrastructure" and nothing about your app? You can literally use pipenv to manage your system Python yourself outside of the context of your app. That's why that flag exists.

    Also found an old issue called pipenv install --system - unsure what it does on their git.

    So the workflow is like using pipenv to lock your packages in your local develop environment, so you have a Pipfile and Pipfile.lock which is up-to-date. Then you could deploy it by pipenv install --deploy --system in your production server. The --deploy will make sure your packages are properly locked in Pipfile.lock since it will check the hashes.

    Again, that's outside of the context of your app. It's about managing system Python as applied to IT infrastructure. (Outside of docker). Meaning, if you have system programs that rely on Python you've installed via apt (because they aren't on PyPi to install via pip), you can resolve potential conflicts by using pipenv, as apt really sucks at it (just as much as pip does). But if you're installing anything via apt to support your app, see the 12 factor deal above.

    EDIT 2: Via Best practices for containerizing Python applications with Docker - SNYK:

    If we just run pip install, we will install many things in many places, making it impossible to perform a multi-stage build. In order to solve this we have two potential solutions:

    Use pip install --user

    Use a virtualenv

    Using pip install --user could seem like a good option since all the packages will be installed in the ~/.local directory, so copying them from one stage to another is quite easy. But it creates another problem: you’d be adding all the system-level dependencies from the image we used for compiling the dependencies to the final Docker base image — and we don’t want that to happen (remember our best practice to achieve as small a Docker base image as possible).

    FROM python:3.10-slim as build
    RUN apt-get update
    RUN apt-get install -y --no-install-recommends \
        build-essential gcc 
    
    WORKDIR /usr/app
    RUN python -m venv /usr/app/venv
    ENV PATH="/usr/app/venv/bin:$PATH"
    
    COPY requirements.txt .
    RUN pip install -r requirements.txt
    
    FROM python:3.10-slim@sha256:2bac43769ace90ebd3ad83e5392295e25dfc58e58543d3ab326c3330b505283d
    WORKDIR /usr/app/venv
    COPY --from=build /usr/app/venv ./venv
    COPY . .
    
    ENV PATH="/usr/app/venv/bin:$PATH"
    CMD [ "python", "app.py" ]