Search code examples
xquerymarklogic

How to get the checksum of a binary file in ML DB


The objective is to detect whether a binary file has been changed / modified. If there is a "real" change, a certain process will be triggered.

The binary files like pdf or zip have been stored inside ML. I am thinking to generate the checksum like xdmp:md5 for xml node for those binary files as well. However xdmp:md5 works for string only. How to do that with binary uri stored inside ML DB?

OR should I simply use external tool to generate the checksum and store that file signature as a property for that binary file?


Solution

  • If you attempt to use the document-node() instead of the binary(), the error message is a bit misleading:

    arg1 is not of type xs:string

    It should probably state: "arg1 is not of type xs:string or binary()"

    The xdmp:md5 and xdmp:sha* hash functions accept an item() that is either an xs:string or a binary()

    $data Data to be hashed. Must be xs:string or a binary node.

    So look to use either of them:

    Just note that you need to send the binary() node, not the document-node(). So, if you select the document with fn:doc(), XPath to the node:

    doc("/myDoc.pdf")/binary() => xdmp:md5()