I have a tiny regex: foo(\b)?
. This was meant to be an experiment to see if I can deduce the existence of the boundary just by checking whether the first group was matched (and resulting in an empty string) or not.
I tried this with some languages: PHP/Python/Java/C#/RustInput manually. All of them behave as expected: An empty string for the first match and null
/None
/nothing for the second.
I can't figure out how to write a proper snippet in Go and C++, but regex101 says Go goes with those; I'm unsure about C++.
However, this is not the case with JS, as it outputs undefined
for group 1 in both matches against foo food
.
console.config({ maximize: true });
console.log(...'foo food'.matchAll(/foo(\b)?/g));
<script src="https://gh-canon.github.io/stack-snippet-console/console.min.js"></script>
Yet, (\b)
without ?
does capture an empty string.
console.config({ maximize: true });
console.log(...'foo food'.matchAll(/foo(\b)/g));
<script src="https://gh-canon.github.io/stack-snippet-console/console.min.js"></script>
Considering that ?
is greedy, shouldn't (\b)
always match and capture an empty string after the first foo
, as with other languages? What are the alternatives?
I can reproduce this in both NodeJS and Chrome (V8) as well as Firefox (Gecko), so this is probably a quirk rather than a bug.
As discussed in both the questions and the comments, this is a quirk. I don't know why nor how, but I have found an alternative: foo(?:(\b)|)
. Group 1 results in an empty string if the first branch matched and nothing otherwise, effectively disabling this strange behaviour of ?
.
[...'foo food'.matchAll(/foo(?:(\b)|)/g)]
// [0: 'foo', 1: '']
// [0: 'foo', 1: undefined]
Try it on regex101.com.
Try it:
console.config({ maximize: true });
console.log(...'foo food'.matchAll(/foo(?:(\b)|)/g));
<script src="https://gh-canon.github.io/stack-snippet-console/console.min.js"></script>
An empty branch is most oftenly seen as a non-recommended version of ?
,[citation needed] but it seems that they have some differences after all, at least in ECMAScript.