Canvas rendering performance issue

I have a set of images I, that I want to perform some processing on (tinting), before drawing onto the main canvas. For my use case, the image needs a different tint that needs to be calculated every frame.

To achieve the desired result, I draw the image I onto a secondary hidden canvas B, with a globalCompositeOperation to apply the desired effect. Thus far, there are no noticable performance issues. It's only when I draw B to the main canvas that I start to see a significant framerate drop. Looking at chrome's performance debugger seems to suggest most of the time is taken by the GPU, and not scripting.

What am I doing wrong? Is there a way to work around this performance issue?

I have tried separately drawing on canvas B, and in a different test drawing B to the main canvas. Neither seem to bring any noticable performance issue. Only the combination of both drawing to B and drawing B to the main canvas gives this performance issue.

Context: I am using an apple silicon M1 chip, running on the latest version of chrome

TL;DR Reproduceable example

https://jsfiddle.net/t920zro8/1/

draw2() is more than 10x slower than draw1(), despite only having 2x the number of operations. (In my own project, the difference is far more than 10x but this example seems to reflect the same idea)

It seems tinting using globalCompositeOperation is not the main culprit, but the act of drawing the second canvas onto the main is.

Solution

TL;DR: Use the willReadFrequently: true attribute when getting the 2D context of the canvas that gets copied. Example:

hiddenCanvas.getContext('2d', {willReadFrequently: true});

Note: This attribute is not yet supported on Safari.

I'm not an expert on how browsers work internally, but based on my limited knowledge of their performance optimizations (and from looking a little at the Webkit and Chromium source code), my best guess is that it's fast to draw an image to a canvas because images are static, and their data is cached and easy to copy in the CPU. On the other hand, canvases are very dynamic, buffered in the GPU, and likely aren't cached as efficiently. Therefore, when you try to draw the contents of one canvas onto another, the browser has to do additional work to copy over the buffer from the GPU to the CPU (I'm guessing going down a similar code path as calling getImageData(), which is also slow).

If you really need to manage and synchronize two canvases, you can indicate to the browser that you will read the contents of one canvas frequently using the willReadFrequently context attribute when calling getContext('2d'). In short, this option forces the use of the CPU instead of the GPU for rendering the canvas. It should only be applied to the canvas context that is being copied, which in your linked example is hidden. And from the snippet below, you can see that draw1() and draw2() now perform much more similarly:

const hidden = document.getElementById('hidden').getContext('2d', {
  willReadFrequently: true
});

const main = document.getElementById('main').getContext('2d')
const image = document.getElementById('image')
const result = document.getElementById('result')

// Draw image onto screen
function draw1() {
  main.drawImage(image, 10, 10)
  main.drawImage(image, 10, 10)
}

// Draw image onto second 8x8 canvas, and then onto main
function draw2() {
  hidden.drawImage(image, 0, 0)
  main.drawImage(hidden.canvas, 10, 10)
}

function measurePerformance(fn) {
  const end = performance.now() + 100
  let i = 0
  while (performance.now() < end)
    fn(), i++
    return i * 10 // Runs per second
}

result.textContent = `
Simple draw: ${measurePerformance(draw1).toLocaleString()} draw/s
Draw on second canvas first: ${measurePerformance(draw2).toLocaleString()} draw/s
`.trim()

body {
  font-family: monospace;
  background: black;
  color: white;
}

#result {
  white-space: pre-wrap;
}

<canvas id="hidden" width="8" height="8"></canvas>
<canvas id="main" width="100" height="100"></canvas>

<img src="data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAAgAAAAICAYAAADED76LAAAAAXNSR0IArs4c6QAAADhJREFUKFNjZICC/////4exQTQjIyMjmAYR6JIwhSBFjLgk4YqQFcCMRREjSQGyI+FWEHQkIW8CAMzHJ/nFK++nAAAAAElFTkSuQmCC" id="image" hidden />

<div id="result"></div>

By using the CPU for rendering, you get significantly better cache locality reading the canvas buffer since it doesn't need to be consistently copied from the GPU. However, using the CPU makes the actual speed of rendering slower, so apply it sparingly (a.k.a. only in situations where you actually start seeing performance issues). In this case, I/O is the main bottleneck, so the benefits of willReadFrequently outweigh the drawbacks.