Search code examples
pythonlinuxdockerpip

Removing pip cache after installing dependencies in Docker image


I noticed that docker images may be large because of keeping pip cache in /root/.cache/pip. I know I can remove this directory after all my dependencies are installed in my docker image. What I'm not sure is how this relates to docker's BuildKit which allows quicker installation by using cache. Are these two somehow related? So if I'd like to benefit from BuildKit is it safe to remove /root/.cache/pip? My question is motivated by heavy python dependencies like torch and nvidia which may occupy few GB. Removing pip cache may decrease the size of the image by 2-3 GB.


Solution

  • The better solution here is to not cache the packages in the first place (you're not going to need them anyway; the image build process won't benefit from them unless you're doing something terrible).

    The simplest solution is to just pass --no-cache-dir to your pip invocations, and it won't cache the packages to disk in the first place. Alternatively, you can drop a pip.conf containing:

    [global]
    no-cache-dir = True
    

    to /etc/pip.conf in the container to disable it globally (without needing to pass the switch each time). Note that if your image ships with a version of pip prior to 19.0.1 the pip.conf solution is buggy; if that's the case, you can use --no-cache-dir command line switch manually to update pip to a post-19.0.1 version, then modify /etc/pip.conf to add the extra line if needed.

    Bonus: You may want to expand the pip.conf to:

    [install]
    compile = no
    
    [global]
    no-cache-dir = True
    

    where the compile = no tells pip not to compile the Python source files to bytecode on install; the benefit of pre-compiled bytecode is (slightly) faster startup, but by bloating your image, it will take longer to download/run it, so the cost to the Docker layer outweighs any savings for Python launch itself.

    Lastly, add:

    ENV PYTHONDONTWRITEBYTECODE=1
    

    to your Dockerfile (can be combined with other ENV settings to avoid extra layers) near the top of the file. Where the pip.conf prevents compiling/writing bytecode on install, the environment variable prevents writing them at runtime (which would be a pointless exercise; when the container exits, the bytecode would be lost anyway), and since "runtime" includes runs of pip itself to install new packages, you want to avoid any of pip's dependencies being compiled to bytecode in those layers.