Search code examples
javascriptnode.jspromisebluebird

How to break out of a serial loop when using promises?


I've got a long text file that I loop through line by line to extract some event data and store it in a database. The file periodically gets updated with new data at the top. When that happens, I run through the file again extracting the new events, but I want to stop when I get to an event that's already in the database (the file is always ordered newest to oldest).

Using the reduce() approach described in this answer to the question Correct way to write loops for promise, I've come up with this function to parse the file:

function parse(
    file)
{
    var lines = file.split("\n"),
        latestDate;

    return lines.reduce(function(promise, line) {
        return promise.then(function() {
            if (/* line matches date pattern */) {
                latestDate = line;
            } else if (/* line matches event pattern */) {
                return Event.createAsync(line, latestDate);
            }

            return promise;
        });
    }, Promise.resolve())
        .catch({ errorName: "uniqueViolated" }, 
            function() { /* ignore only the createAsync error */ });
}

The createAsync() database method returns a promise that's resolved when the event is saved. It will throw an exception if the event already exists in the database, which stops the promise chain so the rest of the file isn't parsed. That exception is caught and ignored by the catch() handler at the end of the function. I'm using the Bluebird 3.0 promise library in Node.js.

This function does loop through each line serially and correctly stops when it hits an already saved event. But I'm wondering if this is the best way to break out of a loop while dealing with promises. Swallowing the thrown exception at the end of the function seems a bit kludgy.

Any suggestions for improving the loop handling are welcome.

Solution?

Building on jib's answer, and taking into account Bergi's comment that maybe I should've just tried his non-reduce answer to the question I linked to :), I came up with this solution:

function parse(
    file)
{
    var lines = file.split("\n"),
        latestDate;

    return promiseEach(lines, function(line) {
        if (/* line matches date pattern */) {
            latestDate = line;
        } else if (/* line matches event pattern */) {
            return Event.createAsync(line, latestDate)
                .catch({ errorType: "uniqueViolated" }, function() { return false; });
        }
    });
}

The loop recursion is moved into a generic function, promiseEach(), that loops over every item in an array. If the iterator function returns a promise, the next item isn't processed until that promise resolves. If the iterator returns false, then the loop ends, Lo-dash style:

function promiseEach(
    list,
    iterator,
    index)
{
    index = index || 0;

    if (list && index < list.length) {
        return Promise.resolve(iterator(list[index])).then(function(result) {
            if (result !== false) {
                return promiseEach(list, iterator, ++index);
            }
        });
    } else {
        return Promise.resolve();
    }
}

I think that does what I want, but I'm wondering if there'll be call stack issues if I run it over a 4000-line file.


Solution

  • What you have doesn't actually break out of a loop at all.

    Every call to Event.createAsync returns successfully right away with a promise, which means you always reduce over the entire array.

    The length of the promise-chain produced by this loop will therefore always be the total number of lines in the file, less the number of lines that fit neither the date nor event pattern in your particular logic.

    It's the asynchronous execution of this promise-chain that's later terminated when an error is thrown because an event already exists in the database.

    Your code works, but you said this was a long text file, so it might be inefficient, especially if breaking out early is the norm rather than the exception (which it sounds like from your description).

    I would therefore consider a recursive approach instead:

    function parse(file) {
      var latestDate;
    
      function recurse(lines, i) {
        if (i >= lines.length) return Promise.resolve();
    
        var line = lines[i];
        if (/* line matches date pattern */) {
          latestDate = line;
        } else if (/* line matches event pattern */) {
          return Event.createAsync(line, latestDate).then(() => recurse(lines, i + 1));
        }
        return recurse(lines, i + 1);
      }
    
      return recurse(file.split("\n"), 0);
    }
    

    The benefit of a recursive approach is that the promise chain is extended asynchronously, when Event.createAsync resolves, and only as needed. You can also merely stop calling recurse to stop, i.e. there's no need for Event.createAsync to throw an exception to break out.

    A way to visualize the difference may be to compare it to laying down tracks for a train, where the track represents the promise-chain, and the train represents the execution of the asynchronous operations that are promised:

    With reduce, you always lay down the entire track first before the train starts, regardless of how far the train ends up going down the track before an exception stops it. You eat the cost of laying down the entire track each time (it may not be much, but it can add up).

    In the recurse example, you're laying down the next piece of track just-in-time in front of the moving train, like Gromit in the finale of "The Wrong Trousers", so no time is wasted laying tracks that wont be needed.