Search code examples
asynchronouspromiseasync-awaites6-promisebluebird

Does promises and async / await create a lot of threads if it is batch processing and is it better than the synchronous version?


If it is a JavaScript program written using Node.js that will look through all employees, fetch some data, do some calculation, and post it back to another server:

// without error handling to keep it simple for now

for (let employee of employees) {
  new Promise(function(resolve) {
    fetch(someUrlToServerA + employee.id).then(resolve);
  }.then((data) => {
    let result = doCalculations(data);
    return postData(someUrlToServerB + employee.id, result).then(resolve);
  }.then(() => console.log("Finished for", employee.id));
}
console.log("All done.");

If written using async/await, it maybe roughly equivalent to:

(async function(){
  for (let employee of employees) {
    data = await fetch(someUrlToServerA + employee.id);

    let result = doCalculations(data);
    await postData(someUrlToServerB + employee.id, result);

    console.log("Finished for", employee.id);
  }
  console.log("All done.");
})();

Let's say if there are 6000 employees, then won't the program (running using Node.js) keep on making requests to ServerA, and in fact print out "All done" almost instantly (maybe within seconds), but now just have 6000 threads all trying to get data from ServerA, and then do calculations, and post to ServerB? Would there be a better way to do it?

It seems there might be some benefits to making requests in parallel: if each request to ServerA takes 3 seconds, then making parallel requests to it probably will save some time if it can return 4 requests within the 3 seconds. But if ServerA is sent to many requests at the same time, then it may just bottle up the requests and just be able to process a few requests at a time. Or, using this method, does the system actually limit the amount of simultaneous fetches by limiting the number of connections at the same time. So let's say if it limits 4 connections at the same time, then "All done" is printed quickly, but internally it is processing 4 employees at the same time, so it is alright? If ServerA and ServerB don't complain about having several requests at the same time, and the calculation, let's say take milliseconds to finish, then this method may take 1/4 of the time to finish compared to the synchronous version?


Solution

  • First of all, JavaScript executes your JavaScript code typically with one thread, whether you use promises or not. Multiple threads can come into play when you use Web Workers, and also in lower-level, non-JavaScript code that JavaScript relies on (like file I/O, HTTP request handling, ...etc).

    The first piece of code is not well designed, as the for loop executes synchronously, so the next iteration will not wait for the promise of the previous iteration to resolve.

    Because of this, the requests will indeed all be triggered at almost the same time, and "done" will be output synchronously (immediately). A server may complain about the many requests it gets in a very short time. Often servers set a maximum limit on the number of requests per time unit, or (in the worst case) they may just go down under the load.

    Also:

    • You are employing the promise constructor antipattern: don't create a new Promise when you already have a promise (returned by fetch)

    • The promise returned by fetch does not resolve to the data directly. Instead it resolves to a response object that exposes methods to get to the data asynchronously.

    Here is a possible way to chain the promises, so the next fetch will only happen when the previous one had a response:

    let promise = Promise.resolve();
    for (let employee of employees) {
        promise = promise.then(() => fetch(someUrlToServerA + employee.id))
            .then((response) => response.json()) // assuming you get data as JSON
            .then((data) => postData(someUrlToServerB + employee.id, doCalculations(data)))
            .then(() => console.log("Finished for", employee.id));
    }
    promise.then(() => console.log("All done."));
    

    Asynchronous "recursion"

    This above solution creates all promises in one sweep. To delay the creation of promises until they are really necessary, you could create an asynchronous loop:

    (function loop(i) {
        if (i >= employees.length) {
            console.log("All done.");
            return;
        }
        let employee = employees[i];
        fetch(someUrlToServerA + employee.id))
            .then((response) => response.json()) // assuming you get data as JSON
            .then((data) => postData(someUrlToServerB + employee.id, doCalculations(data)))
            .then(() => console.log("Finished for", employee.id)
            .then(() => loop(i+1));
    })(0);
    

    The async await version

    Because of the async and await keywords, the for loop here does not do all iterations synchronously, but only gets to the next iteration when the promises created in the previous iteration have been resolved. The second code snippet is a better version than the first when it comes to doing things one after the other. Again, it misinterprets the value that the fetch promise resolves to. It resolves to a response object, not to the data. You should also declare data as a variable or it will be global (in sloppy mode):

    (async function(){
        for (let employee of employees) {
            let response = await fetch(someUrlToServerA + employee.id);
            let data = await response.json();
            let result = doCalculations(data);
            await postData(someUrlToServerB + employee.id, result);
            console.log("Finished for", employee.id);
        }
        console.log("All done.");
    })();
    

    Running in parallel

    Although JavaScript cannot execute multiple lines of its code in parallel, the underlying APIs (which may rely on non-JS code and Operation System calls) can operate in parallel. So indeed the processes that deal with HTTP requests and inform JavaScript (via its event queue) that a request has a response, can run in parallel.

    If you want to go that way, then you should initiate some (or all) fetch calls synchronously, and use Promise.all to wait for all those returned promises to resolve.

    Your first piece of code would then need to be rewritten as:

    let promises = [];
    for (let employee of employees) {
        promises.push(fetch(someUrlToServerA + employee.id)
            .then((response) => response.json()) // assuming you get data as JSON
            .then((data) => postData(someUrlToServerB + employee.id, doCalculations(data))
            .then(() => console.log("Finished for", employee.id)))
    }
    Promise.all(promises).then(() => console.log("All done."));
    

    Limiting parallelism

    If you want a hybrid solution, were the number of pending promises is limited to, let's say, 4, then you need to combine the use of Promise.all (working on an array of 4 promises), with the chaining that is happening in the first code block (using promise = promise.then()).

    I'll leave that for you to design. If you have an issue with getting that to work, you can come back with a new question.