I have the following regular expression (Python) that I don't understand at the following point. Why doesn't it match the first alternation, too?
Regex (spaced for better understanding):
(?:
\$\{
(?P<braced>
[_a-zA-Z][a-zA-Z0-9]*(?::[_a-zA-Z][_a-zA-Z0-9]*)+
)
\}
)
| ### SECOND ALTERNATION ###
(?:
\$
(?P<named>
[_a-zA-Z][a-zA-Z0-9]*(?::[_a-zA-Z][_a-zA-Z0-9]*)+
)
)
Test String:
asdasd $asd:sd + ${asd123:asd} $HOME $$asd
Matched stuff:
asdasd $asd:sd + ${asd123:asd} $HOME $$asd
According to the regex pattern above, the first alternation should also appear, namely:
${asd123:asd}
It seems I don't quite get the alternation pattern?
In order to capture ${...}
, you need to remove ?:
to turn non-capturing groups into capturing ones. You can make them named as well. Also [_a-zA-Z0-9] is equal to \w, thus we can shorten your regex a bit:
(?P<Alternation1>
\$\{(?P<braced>[_a-zA-Z][a-zA-Z0-9]*(?::[_a-zA-Z]\w*)+)
\}
)
|
(?P<Alternation2>
\$(?P<named>[_a-zA-Z][a-zA-Z0-9]*(?::[_a-zA-Z]\w*)+
)
)
Have a look at the demo. This regex requires the use of x
option (and g
options on regex101.com to show all matches, in Python, you'd use findall
or finditer
).
More information about non-capturing groups is available on SO and at regular-expressions.info.
To just get all matches in Python, you can use finditer
like this:
import re
p = re.compile(ur'''(?P<Alternation1>
\$\{(?P<braced>[_a-zA-Z][a-zA-Z0-9]*(?::[_a-zA-Z]\w*)+)
\}
)
|
(?P<Alternation2>
\$(?P<named>[_a-zA-Z][a-zA-Z0-9]*(?::[_a-zA-Z]\w*)+
)
)
''', re.VERBOSE)
test_str = u"asdasd $asd:sd + ${asd123:asd} $HOME $$asd"
print [x for x in re.findall(p, test_str)]
See IDEONE demo