A couple of days ago i have been playing with canvas Pixel by Pixel manipulation and i have noticed a slight performance increase when accessing typed arrays from 32bit BufferView.
Example:
var canvas = document.querySelector("canvas");
var ctx = canvas.getContext("2d");
var image_data = ctx.getImageData(0, 0, canvas.height, canvas.width);
var image_buffer = new ArrayBuffer(image_data.data.length);
var image_buffer8 = new Uint8ClampedArray(image_buffer);
var image_buffer32 = new Uint32Array(image_buffer);
var pixel, color;
console.time("array-index");
for(pixel=0; pixel< image_buffer8.length; pixel += 4) {
color = Math.random() * 255;
image_buffer8[pixel] = color; // Red
image_buffer8[pixel +1] = color; // Green
image_buffer8[pixel +2] = color; // Blue
image_buffer8[pixel +3] = 255; // Alpha
}
console.timeEnd("array-index");
console.time("array-bitwise");
for(pixel = 0; pixel<image_buffer32.length; pixel++){
color = Math.random() * 255;
image_buffer32[pixel] = ( 255 << 24 | color << 16 | color << 8 | color );
}
console.timeEnd("array-bitwise");
The output is :
array-index: 4.273ms
array-bitwise: 3.743ms
The question is:
Why accessing the array from a 32bit BufferView is faster even if it has bitwise operators inside , as i see it bitwise arithmetic should also cost a CPU time ?
I am interested in the following aspects :
The 8 bit assignment operations are much more expensive than the bitwise operations - you have to take a look at this kind of things from the way modern CPUs are architected: internally all the pathways are (at least) 32bit wide. Moving data from one point to another - in this case a calculated result "costs" the same: if you are moving 8 bit around, it takes as much CPU resources as moving 32 bit around - so, int he 8 bit case, you are doing the movement 4 times - and even if moving only from the CPU caculating unity to Level 1 cache, it is still 4 times more expensive than a single 32 bit data movement.
When coding in static typed languages with modern compilers, like C, the compiler could, possibly, automatically optimize this kind of code using a "SIMD" (Single Instruction, Multiple Data) machine instruction to actually pack the four 8 bit assignment as a single 32 bit assignment internally (even if not likely). That is much harder to do with a dynamic language such as javascript, even if it is running in a JITted environment (real time optimization to native code).