Search code examples
javascriptcryptographymd5cryptojsmd5-file

MD5 checksum not calculated properly for files other than txt?


I am using crypto-js to calculated the MD5 checksum for my file before uploading, below is my code.

import CryptoJS from "crypto-js";

const getMd5 = async (fileObject) => {
  let md5 = "";
  try {
    const fileObjectUrl = URL.createObjectURL(fileObject);
    const blobText = await fetch(fileObjectUrl)
      .then((res) => res.blob())
      .then((res) => new Response(res).text());

    const hash = CryptoJS.MD5(CryptoJS.enc.Latin1.parse(blobText));
    md5 = hash.toString(CryptoJS.enc.Hex);
  } catch (err) {
    console.log("Error occured getMd5:", err);
  }
  return md5;
};

Above code is working fine for text files only but while working with non text files file images, videos etc., the checksum is calculated incorrectly.

Any help/input is appreciated. Thanks!


Solution

  • Response.text() reads the response stream and converts it to a string using a UTF-8 encoding. Arbitrary binary data that is not UTF-8 compliant will be corrupted in this process (e.g. images, videos, etc.), s. also the other answer.
    This is prevented by using Response.arrayBuffer() instead, which simply stores the data unchanged in an ArrayBuffer.
    Since CryptoJS works internally with WordArrays, thus a further conversion of the ArrayBuffer into a WordArray is necessary.

    The following fix works on my machine:

    (async () => {
                
        const getMd5 = async(fileObject) => {
            let md5 = "";
            try {
                const fileObjectUrl = URL.createObjectURL(blob);
                const blobText = await fetch(fileObjectUrl)
                    .then((res) => res.blob())
                    .then((res) => new Response(res).arrayBuffer());                    // Convert to ArrayBuffer       
                const hash = CryptoJS.MD5(CryptoJS.lib.WordArray.create(blobText)); // Import as WordArray
                md5 = hash.toString(CryptoJS.enc.Hex);
            } catch (err) {
                console.log("Error occured getMd5:", err);
            }
            return md5;
        };
            
        const blob = new Blob([new Uint8Array([0x01, 0x02, 0x03, 0x7f, 0x80, 0x81, 0xfd, 0xfe, 0xff])]);
        console.log(await(getMd5(blob)));
            
    })();
    <script src="https://cdnjs.cloudflare.com/ajax/libs/crypto-js/4.0.0/crypto-js.min.js"></script>

    For simplicity, I did not use a file object for the test, but a blob object with data that is not UTF8 compliant. The generated hash is correct and can be verified online e.g. here