I have an issue whereby my automated build needs to zip the contents of a directory and Sha256sum the zip file. Easy enough.
However, the next run of the automated build needs to zip the same contents and Sha256sum the zip file in order to see whether any source code has changed.
Locally I've ran the following commands and generated a zip with the same hash each time (expected, as I haven't changed any of the code):
zip -q -r -X my-directory.zip my_directory/* --exclude ".gitignore" "requirements.txt" "*__pycache__/*" "*/\infrastructure/*"
sha256sum my-directory.zip | awk '{ print $1 }' > my-directory.zip.hash
cat my-directory.zip.hash
My build runs the same commands, but at the start of each run it is doing a checkout of the code from GitHub (as the build runs inside a docker container), which results in a different hash despite no code changes.
I've re-created the problem locally by deleting the repo and re-cloning.
Any ideas? I am thinking it's metadata or some sort but have tried different exclusion commands without luck.
Your current method looks error prone.
It relies on carefully constructed --exclude
parameters,
and assumes no unexpected files.
That's very fragile.
A better way would be to use the git archive
command to create the zip:
git archive HEAD -o my-directory.zip my_directory
This will disregard ignored files and other files not part of the repository. This should guarantee consistent results.
Note however that git archive
adds as comment the commit id.
If for some reason you want to remove that, you can do so by running this additional command:
zip -z my-directory.zip < /dev/null