Search code examples
javascriptregexregex-negationstring-operations

How to find indexes of all non-matching characters with a JS regex?


I've got a string and I want to get an array with the indexes (positions) of the characters in this string that do not match a certain regex criteria.

The issue here is that if I write it like this:

let match;
let reg = /[A-Za-z]|[0-9]/g;
let str = "1111-253-asdasdas";
let indexes = [];

do {
    match = reg.exec(str);
    if (match) indexes.push(match.index);
} while (match);

It works. It returns the indexes of all the characters that are numerical or alphabetical. But the problem is that if I try to make the opposite, with a negative lookahead in Regex, like this:

let match;
let reg = /(?!([A-Za-z]|[0-9]))/g;
let str = "1111-253-asdasdas";
let indexes = [];

do {
    match = reg.exec(str);
    if (match) indexes.push(match.index);
} while (match);

It ends up in an infinite loop.

What I'd like to achieve is the same result as in the first case, but with the negative regex, so in this case the result would be:

indexes = [4, 8]; // which are the indexes in which a non-alphanumerical character appears

Is the loop wrong, or it's the regex expression the one who is messing things up? Maybe the exec is not working with negative lookaheads Regex expressions?

I would understand the regex expression not working as I intended to (because it may be wrongly formatted), but I don't understand the infinite loop, which leads me to think that exec maybe is not the best way to achieve what I'm looking for.


Solution

  • Reason

    The infinite loop is easy to explain: the regex has a g modifier and thus tries to match multiple occurrences of the pattern starting each matching attempt after the end of the previous successful match, that is, after the lastIndex value:

    See exec documentation:

    If your regular expression uses the "g" flag, you can use the exec() method multiple times to find successive matches in the same string. When you do so, the search starts at the substring of str specified by the regular expression's lastIndex property

    However, since your pattern matches an empty string, and you do not check the condition if the index is equal to lastIndex, the regex cannot advance in a string.

    Solution

    Use a regex to match any non-alphanumeric chars, /[\W_]/g. Since it does not match empty strings the lastIndex property of the RegExp object will be changed with each match and no infinite loop will occur.

    JS demo:

    let match, indexes = [];
    let reg = /[\W_]/g;
    let str = "1111-253-asdasdas";
    
    while (match = reg.exec(str)) {
        indexes.push(match.index);
    }
    console.log(indexes);

    Also, see how to move the lastIndex property value manually.