Search code examples
javascriptnode.jsdataviewarraybuffer

In a NodeJs code with (Array)Buffers, native readUInt16BE is used but my context is a browser so i want to use use DataView.getUint16 instead


I came across a JPEG parsing function in Node.js that I'm attempting to adapt for use in a browser environment. The original code can be found here.

The original code uses Node.js' Buffer class. As i would like to use it it for a browser environment we have to use the DataView.getUint16(0, false /* big endian */) instead of buffer.readUInt16BE(0) /*BE = big endian */

Interestingly, DataView is also available in NodeJs, so the result could be cross environement.

Here what I found so far :

  • Introducing a variable j starting from 4 helps get the correct offset for the first iteration, as the buffer 4 first bytes are sliced :
  let j=4 // match the buffer slicing above
  • Adding + 2 to j for next reading does not help getting the correct offset for next iteration despite the buffer being sliced of exactly two more bytes
    j+=2; // match the buffer slicing below ( i + 2 )
    buffer = buffer.slice(i + 2); // Buffer is sliced of two bytes, 0 offset is now 2 bytes further ?

Here is the function with logging added

function calculate (buffer) {

  // Skip 4 chars, they are for signature
  buffer = buffer.slice(4);
  let j=4 // match the buffer slicing above
  let aDataView=new DataView(buffer.buffer);
  var i, next;
  while (buffer.length) {
    // read length of the next block
    i = buffer.readUInt16BE(0);
    console.log("i="+i,"read="+aDataView.getUint16(j,false));
    j+=2; // match the buffer slicing below ( i + 2 )
    // ensure correct format
    validateBuffer(buffer, i);

    // 0xFFC0 is baseline standard(SOF)
    // 0xFFC1 is baseline optimized(SOF)
    // 0xFFC2 is progressive(SOF2)
    next = buffer[i + 1];
    if (next === 0xC0 || next === 0xC1 || next === 0xC2) {
      return extractSize(buffer, i + 5);
    }

    // move to the next block
    buffer = buffer.slice(i + 2);
  }

  throw new TypeError('Invalid JPG, no size found');
}

Actual result on this image:

node .\start.js 
i=16 read=16 # Seems to be the correct offset
i=91 read=19014 # Wrong offset
i=132 read=18758

My debbuging steps are so far: Installed buffer-image-size from npm npm install buffer-image-size --save Wrote start.js as the following

var sizeOf = require('buffer-image-size');
const fs = require('fs');

fileBuffer = fs.readFileSync("flowers.jpg");
var dimensions = sizeOf(fileBuffer);
console.log(dimensions.width, dimensions.height);

Edited "node_modules\buffer-image-size\lib\types\jpg.js" adding mentioned lines and logging

Do you have any hint about

  • Why adding 2 to j does no helps to get the correct offset.
  • How to get the same algorithm without slicing the buffer over and over

I appreciate any insights or guidance on resolving this issue. Thank you!


Solution

  • Yeah, avoid to both advance offsets and re-slice the buffer, it only gets confusing. I would write

    function calculate(typedArray) {
      const view = new DataView(typedArray.buffer, typedArray.byteOffset, typedArray.byteLength);
      let i = 0;
      // Skip 4 chars, they are for signature
      i += 4;
    
      while (i < view.byteLength) {
        // read length of the next block
        const blockLen = view.getUint16(i, false /* big endian */);
    
        // ensure correct format
        // index should be within buffer limits
        if (i + blockLen > view.byteLength) {
          throw new TypeError('Corrupt JPG, exceeded buffer limits');
        }
        // Every JPEG block must begin with a 0xFF
        if (view.getUint8(i + blockLen) !== 0xFF) {
          throw new TypeError('Invalid JPG, marker table corrupted');
        }
    
        // 0xFFC0 is baseline standard(SOF)
        // 0xFFC1 is baseline optimized(SOF)
        // 0xFFC2 is progressive(SOF2)
        const next = view.getUint8(i + blockLen + 1);
        if (next === 0xC0 || next === 0xC1 || next === 0xC2) {
          return extractSize(view, i + blockLen + 5);
        }
    
        // move to the next block
        i += blockLen + 2;
      }
    
      throw new TypeError('Invalid JPG, no size found');
    }
    

    Notice that this code, which is a straightforward translation of the source, is slightly confusing and buggy:

    • i does not point to the start of the segment, but rather two bytes into the segment (after the marker)
    • the code skips the first dynamic segment (which, admittedly, is required to be an APP0 segment anyway)
    • the code assumes all segments to have a variable length specified in their header, and ignores standalone markers as well as fill bytes
    • the code may cause RangeError exceptions from accessing bytes beyond the end of the buffer, as it only checks for the past block to be within limits