I have a bash script to output a file manifest with MD5 hashes as a JSON like so:
{
"files": [
{
"md5": "f30ae4b2e0d2551b5962995426be0c3a",
"path": "assets/asset_1.png"
},
{
"md5": "ca8492fdc3547af31afeeb8656619ef0",
"path": "assets/asset_2.png"
},
]
}
It will return a list of all files except .gdz.
The command I am using is:
echo "{\"files\": [$(find . -type f -print | grep -v \.gdz$ | xargs md5sum | sed 's/\.\///' | xargs printf "{\"md5\": \"%s\", \"name\": \"%s\"}," | sed 's/,$//')]}" > files.json
However, when I run this in production, it sometimes switches the MD5 hash and the file path around. I cannot work out why this is, does anyone know?
You could run md5sum
on all matching files, then do the rest with jq:
find . -type f -not -name '*.gdz' -exec md5sum -z {} + \
| jq --slurp --raw-input '
{
files: split("\u0000")
| map(split(" "))
| map([
.[0],
(.[2:] | join(" "))
])
| map({md5: .[0], path: .[1]})
}'
The output of the find
command is the output of running md5sum
once on all matching files, with output records separated by null bytes.
The jq then does the following (and can almost certainly be optimized):
--slurp
and --raw-input
read the whole input before any processingfiles
as the keysplit("\u0000")
creates an array from the null byte separated input recordsmap(split(" "))
converts each array element to an array split on blanksmap([ .[0], (.[2:] | join(" ")) ])
– to allow blanks in filenames, we create an array for each record where the first element is the md5 hash, and the second element is the concatenation of the rest, i.e., the filename; [2:]
because we want to skip two blanksmap({md5: .[0], path: .[1]})
converts each two-element array into an object with the desired keys