I am building wheels for a pure Python project. The project needs to run in a Docker container, which goes through some heavy build processes after copying the wheel into it. I would like to take advantage of the Docker build cache, but in order to do that I need to ensure that the checksum of the built wheel file does not change unnecessarily.
Wheels are based on zip, which means that when some kinds of metadata in the included files change, the zipfile itself (and hence the wheel) change. But I don't want those kinds of changes to affect the checksum of the files in the wheel. I only want changes to the actual source code of the project to cause a rebuild.
I have thought through some fairly complex solutions to this problem, such as manually keeping a record of the individual checksums of the files listed in myproject.egg-info/SOURCES.txt, but is there a simpler way to prevent the wheel from changing if none of the source has significantly changed? (For example, is there a way to query setuptools or egg-info to tell if a rebuild would change anything?)
If the Docker build also builds the wheel, then the input files will be the ones whose hash matters.
FROM python
COPY . . # this is where cache invalidation will happen
RUN python setup.py bdist_wheel
# ... now do the rest of the build steps
You can also do this as multi-stage build if you want final Docker image not to include all the infrastructure and files needed to build the wheel (https://pythonspeed.com/articles/smaller-python-docker-images/).