Okay, so I have this string "nesˈo:tkʰo:x"
, and I want to get the index of all the zero-width positions that don't occur after any instance of the character ˈ
(the IPA primary stress symbol). So in this case, those expected output would be 0, 1, 2, and 3 - the indices of the letters nes
that occur before the one and only instance of ˈ
, plus the ˈ
itself.
I'm doing this with regex for reasons I'll get into in a bit. Regex101 confirms that /(?=.*?ˈ)/
should match all 4 of those zero-width positions with JS' regex flavor... but I can't actually get JS to return them.
A simple setup might look like this:
let teststring = "nesˈo:tkʰo:x";
let re = new RegExp("(?=.*?ˈ)", "g");
while (result = re.exec(teststring)) {
console.log("Match found at "+result.index);
}
...except that this loops forever. It seems to get stuck on the first match, which I understand has something to do with how RegExp.exec
is supposed to auto-increment RegExp.lastIndex
for global regexes, or something. But I also can't make the regex not global, or it won't return all the matches for strings like this where more than one match is expected.
Okay, so what if I manually increment RegExp.lastIndex
to prevent it from looping?
let teststring = "nesˈo:tkʰo:x";
let re = new RegExp("(?=.*?ˈ)", "g");
while (result = re.exec(teststring)) {
if (result.index == re.lastIndex) {
re.lastIndex++;
} else {
console.log("Match found at "+result.index);
}
}
Now it... prints out nothing at all. Now, to be fair, if lastIndex
starts at 0 by default, and the index of the first match is 0, I half expect that to be skipped over... but why isn't it at least giving me 1, 2 and 3 as matches?
Now, I can already hear the chorus of "you don't need regex for this, just do Array(teststring.indexOf("ˈ")).keys()
or something to generate [0,1,2,3]
". That may work for this specific example, but the actual use case is a parser function that's supposed to be a general solution for "for this input string, replace all instances of A with B, if condition C is true, unless condition D is true". Those conditions might be something like "if A is at the end of the string" or "if A is right next to another instance of A" or "if A is between 'n' and 't'". That kind of complicated string matching problem is why the parser creates and executes regexes on the fly and why regex is getting involved, and it does work for almost everything except this one annoying edge case, which I'd rather not have to refactor the entire mechanism of the parser to deal with if I don't have to.
Use String.prototype.matchAll()
to get all the matches.
let teststring = "nesˈo:tkʰo:x";
let re = new RegExp("(?=.*?ˈ)", "g");
[...teststring.matchAll(re)].forEach(result =>
console.log("Match found at " + result.index)
)