I'm completely new to regex and recently started learning it. Here's a part of my test string from which I'd like to find matches.
24 bit:
Black #000000
12 bit:
Black #000
My question is the following. When I use regex expression #(\w{1,2})
, the group matches 00
in both 24-bit Black and 12-bit Black. However when I use regex #(\w{1,2})\1\1
, the group matches 00
in 24-bit Black but 0
in 12-bit Black. Although I'm not familiar with how regex works, I'm curious what's the logic behind this. When I use curly braces quantifier {a,b}
to indicate a <= (# occurrences) <= b
, for the numbers a, a+1,...,b
, which one is used to check for matching first? For example, with #(\w{1,2})
it seems 2 occurrences is used first. But after adding \1\1
, it seems to me somehow regex was able to see that using 1 occurrence instead of 2 would result in matching 12-bit Black?
The pattern #(\w{1,2})\1\1
can match #000000
and #000
because \w{1,2}
can backtrack 1 position to fit in the matches for the backreferences \1\1
You make the pattern a bit more specific
#([0-9a-fA-F]{1,2})\1\1
Or if there should be no surrounding non whitespace characters:
(?<!\S)#([0-9a-fA-F]{1,2})\1\1(?!\S)
See a regex101 demo.