Bash to create a JSON File Manifest

I have a bash script to output a file manifest with MD5 hashes as a JSON like so:

{
 "files": [
    {
      "md5": "f30ae4b2e0d2551b5962995426be0c3a",
      "path": "assets/asset_1.png"
    },
    {
      "md5": "ca8492fdc3547af31afeeb8656619ef0",
      "path": "assets/asset_2.png"
    },
  ]
}

It will return a list of all files except .gdz.

The command I am using is:

echo "{\"files\": [$(find . -type f -print | grep -v \.gdz$ | xargs md5sum | sed 's/\.\///' | xargs printf "{\"md5\": \"%s\", \"name\": \"%s\"}," | sed 's/,$//')]}" > files.json

However, when I run this in production, it sometimes switches the MD5 hash and the file path around. I cannot work out why this is, does anyone know?

Solution

You could run md5sum on all matching files, then do the rest with jq:

find . -type f -not -name '*.gdz' -exec md5sum -z {} + \
    | jq --slurp --raw-input '
        {
            files: split("\u0000")
                | map(split(" "))
                | map([
                    .[0],
                    (.[2:] | join(" "))
                ])
                | map({md5: .[0], path: .[1]})
        }'

The output of the find command is the output of running md5sum once on all matching files, with output records separated by null bytes.

The jq then does the following (and can almost certainly be optimized):

--slurp and --raw-input read the whole input before any processing
At the outermost level, we build an object with files as the key
split("\u0000") creates an array from the null byte separated input records
map(split(" ")) converts each array element to an array split on blanks
map([ .[0], (.[2:] | join(" ")) ]) – to allow blanks in filenames, we create an array for each record where the first element is the md5 hash, and the second element is the concatenation of the rest, i.e., the filename; [2:] because we want to skip two blanks
map({md5: .[0], path: .[1]}) converts each two-element array into an object with the desired keys