I noticed that docker images may be large because of keeping pip cache in /root/.cache/pip
. I know I can remove this directory after all my dependencies are installed in my docker image. What I'm not sure is how this relates to docker's BuildKit which allows quicker installation by using cache. Are these two somehow related? So if I'd like to benefit from BuildKit is it safe to remove /root/.cache/pip
? My question is motivated by heavy python dependencies like torch and nvidia which may occupy few GB. Removing pip cache may decrease the size of the image by 2-3 GB.
The better solution here is to not cache the packages in the first place (you're not going to need them anyway; the image build process won't benefit from them unless you're doing something terrible).
The simplest solution is to just pass --no-cache-dir
to your pip
invocations, and it won't cache the packages to disk in the first place. Alternatively, you can drop a pip.conf
containing:
[global]
no-cache-dir = True
to /etc/pip.conf
in the container to disable it globally (without needing to pass the switch each time). Note that if your image ships with a version of pip
prior to 19.0.1 the pip.conf
solution is buggy; if that's the case, you can use --no-cache-dir
command line switch manually to update pip
to a post-19.0.1 version, then modify /etc/pip.conf
to add the extra line if needed.
Bonus: You may want to expand the pip.conf
to:
[install]
compile = no
[global]
no-cache-dir = True
where the compile = no
tells pip
not to compile the Python source files to bytecode on install; the benefit of pre-compiled bytecode is (slightly) faster startup, but by bloating your image, it will take longer to download/run it, so the cost to the Docker layer outweighs any savings for Python launch itself.
Lastly, add:
ENV PYTHONDONTWRITEBYTECODE=1
to your Dockerfile
(can be combined with other ENV
settings to avoid extra layers) near the top of the file. Where the pip.conf
prevents compiling/writing bytecode on install, the environment variable prevents writing them at runtime (which would be a pointless exercise; when the container exits, the bytecode would be lost anyway), and since "runtime" includes runs of pip
itself to install new packages, you want to avoid any of pip
's dependencies being compiled to bytecode in those layers.