Search code examples
javascriptasync-awaitpromise

Consider the last promise in a chain for Resolve.all


I want to export HTML content from Confluence pages. Those can contain <img> tags with src attributes that are just usual hyperlinks. Since I want to export those as well I decided to replace the src content to their corresponding data URLs, so that there is src="…".

This needs fetching of the images via HTTP of course, and this can only be done in an asynchronous manner. Also, it contains lots of "nested" asynchronous calls.

This is my code so far:

    /**
     * @param {HTMLTableCellElement | undefined} cell
     */
    async #getCellHtml(cell) {
        if (!cell) return undefined;

        const srcMap = {}

        for await (const imgElement of cell.querySelectorAll('img')) {
            if ("attachment" !== imgElement.dataset.linkedResourceType) {
                return;
            }
            const imgUrl =
                new URL(imgElement.src, imgElement.dataset.baseUrl);
            await fetch(imgUrl)
                .then(response => response.blob())
                .then(blob => blob.arrayBuffer())
                .then(arrayBuffer => {
                    srcMap[imgElement.src] =
                        `data:${imgElement.dataset.linkedResourceContentType};base64,`
                        + Buffer.from(arrayBuffer).toString('base64');
                });
        }

        const cellHtml = cell.innerHTML;

        Object.entries(srcMap).forEach(([imgSrc, dataUrl]) => {
            cellHtml.replace(imgSrc, dataUrl)
        })

        return cellHtml;
    }

For reference, such HTML looks like the following:

<p style="text-align: left;"><br/></p>
<p style="text-align: left;"><span
        class="confluence-embedded-file-wrapper confluence-embedded-manual-size"><img
        class="confluence-embedded-image" draggable="false" width="639"
        src="/confluence/download/attachments/2345432345/image-2024-7-11_16-48-22-1.png?version=1&amp;modificationDate=1720709302000&amp;api=v2"
        data-image-src="/confluence/download/attachments/235432345/image-2024-7-11_16-48-22-1.png?version=1&amp;modificationDate=1720709302000&amp;api=v2"
        data-unresolved-comment-count="0" data-linked-resource-id="345654345"
        data-linked-resource-version="1" data-linked-resource-type="attachment"
        data-linked-resource-default-alias="image-2024-7-11_16-48-22-1.png"
        data-base-url="https://suite.acme.com/confluence"
        data-linked-resource-content-type="image/png"
        data-linked-resource-container-id="1491043790"
        data-linked-resource-container-version="1" alt=""/></span></p>
<p style="text-align: left;"><br/></p>
<p style="text-align: left;"><br/></p>

My intention is loop through all <img> elements, find relevant <img> tags, fetch their image data, and collect a replacement array. Afterwards, I'd just replace all findings with their respective data URL.

What I think I would want is something like this:

cell.querySelectorAll('img').map(cell => {
  // return a Promise that combines all the fetching etc.
  // so that it resolves() with returning the base64 string(!).
  return new Promise()…
});

After I map()ped this array to Promises I could Promise.all() and do the replacement of the HTML then.

I have no idea how to "return" that last promise after all the other ones fulfilled already. Should my code use await's rather than .then() invocations so I don't get into callback context?


Solution

  • A few remarks on your current code

    1. for await (const imgElement of cell.querySelectorAll('img'): as querySelectorAll is not async you don't need for await (...) a plain for (...) loop is ok.

    2. if ("attachment" !== imgElement.dataset.linkedResourceType) { return; } will exit the method on the first element not meeting this condition and leave all other other elements unhandled. Moreover, the images already loaded, won't be replaced, because you never reach the code after the loop. Use continue instead of return to skip the current element and continue with the next element in the list.

    3. You shouldn't mix async/await with then/catch if you don't know exactly what you are doing. Because it will cause confusion and probably lead to unexpected behaviour

    That being said, I'd refactor your code to the following.

    1. As your async #getCellHtml(cell) is async, I'd completely switch to await and ditch all .then(...)

    2. Replace your for loop iterating over all elements with a Promise.all(). You don't really need the result of that Promise.all because if it doesn't throw, you know, all promises have successfully resolved. And as each callback sets the respective value in srcMap object, you know, once the Promise.all() resolved, all images have been loaded.

    ...
    
    let srcMap =  {};
    
    await Promise.all(cell.querySelectorAll('img').map(async c => {
      if ("attachment" !== c.dataset.linkedResourceType) {
        //ignore wrong resource types and do nothing
        return;
      };
    
      //for correct resourcetype load the images and update the `srcMap` object
      const 
        imgUrl = new URL(c.src, c.dataset.baseUrl),
        resp = await fetch(imgUrl),
        blob = await resp.blob(),
        buff = await blob.arrayBuffer();
      
      scrMap[c.src] = ...
    });
    
    const cellHtml = cell.innerHTML;
    
    ...
    

    Of course this code has no errorhandling whatsoever. So if for instance one image fails to load, the whole process throws. But I let including that error handling for you as an exercise.