javascript iterator v8 iterable callstack

Maximum call stack size exceeded for large iterators

I'm trying to convert an iterator to an array. The iterator is the result of calling matchAll on a very long string. The iterator (I assume) has many matches within the string. First I tried it with the spread operator:

const array = [...myLongString.matchAll(/myregex/g)];

This gave me the error: RangeError: Maximum call stack size exceeded

So I tried iterating via next():

const safeIteratorToArray = (iterator) => {
    const result = [];
    let item = iterator.next();

    while (!item.done) {
        result.push(item.value);
        item = iterator.next();
    }

    return result;
};

But this gives me the same error, on the item = iterator.next() line. So I tried making it async in an effort to reset the call stack:

const safeIteratorToArray = async (iterator) => {
    const result = [];
    let item = iterator.next();

    while (!item.done) {
        result.push(item.value);
        item = await new Promise(resolve => setTimeout(() => resolve(iterator.next())));
    }

    return result;
};

But I still get the same error.

If you are curious about the actual use case:

The regex I'm actually using is:

 /\[(.+?)\] \[DEBUG\] \[Item (.+?)\] Success with response: ((.|\n)+?)\n\[/g

And the contents of the text file (it's a log file) generally looks like:

[TIMESTAMP] [LOG_LEVEL] [Item ITEM_ID] Success with response: {
    ...put a giant json object here
}

Repeat that ad-nauseam with newlines in between each log.

Solution

(V8 developer here.)

It's not about the iterator, it's about the RegExp.

[Update]
Looks like I was misled by a typo in my test, so my earlier explanation/suggestion doesn't fix the problem. With the test corrected, it turns out that only the end of the expression (which I called "fishy" before) needs to be fixed.

The massive consumption of stack memory is caused by the fact that (.|\n) is a capture group, and is matched very frequently. One idea would be to write it as [.\n] instead, but the . metacharacter is not valid inside [...] character sets.

Hat tip to @cachius for suggesting an elegant solution: use the s flag to make . match \n characters.
As an unrelated fix, prefer checking for the closing } instead of the next opening [ at the beginning of a line, so that there's no overlap between matched ranges (which would make you miss some matches).
So, in summary, replace ((.|\n)+?)\n\[/g with (.+?)\n}/gs.

[/Update]

Here is a reproducible example. The following exhibits the stack overflow:

let lines = ["[TIMESTAMP] [DEBUG] [Item ITEM_ID] {"];
for (let i = 0; i < 1000000; i++) {
  lines.push("  [0]");  // Fake JSON, close enough for RegExp purposes.
}
lines.push("}");
lines.push("[TIMESTAMP]");
let myLongString = lines.join("\n");

const re = /\[(.+?)\] \[DEBUG\] \[Item (.+?)\] ((.|\n)+?)\n\[/g;
myLongString.match(re);

If you replace the const re = ... line with:

const re = /\[(.+?)\] \[DEBUG\] \[Item (.+?)\] (.+?)\n}/gs;

then the stack overflow disappears.

(It would be possible to reduce the simplified example even further, but then the connection with your original case wouldn't be as obvious.)

[Original post below -- the mechanism I explained here is factually correct, and applying the suggested replacement indeed improves performance by 25% because it makes the RegExp simpler to match, it just isn't enough to fix the stack overflow.]
The problematic pattern is:

\[(.+?)\]

which, after all, means "a [, then any number of arbitrary characters, then a ]". While I understand that regular expressions might seem like magic, they're actually real algorithmic work under the hood, kind of like miniature programs in their own right. In particular, any time a ] is encountered in the string, the algorithm has to decide whether to count this as one of the "arbitrary characters", or as the one ] that ends this sequence. Since it can't magically know that, it has to keep both possibilities "in mind" (=on the stack), pick one, and backtrack if that turns out to be incorrect. Since this backtracking information is kept on the stack (where else?), if you put sufficiently many ] into your string, the stack will run out of space.

Luckily, the solution is simple: since what you actually mean is "a [, then any number of characters that aren't ], then a ]", you can just tell the RegExp engine that, replacing . with [^\]]:

\[([^\]]+?)\]

Note: ((.|\n)+?)\n\[ seems fishy for the same reason, but according to this test doesn't appear to be the problem, even if I further increase the input size. I'm not sure why; it might be due to how I created the test. If you see further problems with the real input, it may be worth reformulating this part as well.
[/Original post]