Search code examples
pythonregexregex-alternation

why does the first alternation not match?


I have the following regular expression (Python) that I don't understand at the following point. Why doesn't it match the first alternation, too?

Regex (spaced for better understanding):

(?:
  \$\{
    (?P<braced>
       [_a-zA-Z][a-zA-Z0-9]*(?::[_a-zA-Z][_a-zA-Z0-9]*)+
    )
  \}
)
|   ### SECOND ALTERNATION ###
(?:
  \$
   (?P<named>
     [_a-zA-Z][a-zA-Z0-9]*(?::[_a-zA-Z][_a-zA-Z0-9]*)+
   )
)

Test String:

asdasd $asd:sd + ${asd123:asd} $HOME $$asd

Matched stuff:

asdasd $asd:sd + ${asd123:asd} $HOME $$asd

According to the regex pattern above, the first alternation should also appear, namely:

${asd123:asd}

It seems I don't quite get the alternation pattern?


Solution

  • In order to capture ${...}, you need to remove ?: to turn non-capturing groups into capturing ones. You can make them named as well. Also [_a-zA-Z0-9] is equal to \w, thus we can shorten your regex a bit:

    (?P<Alternation1>
     \$\{(?P<braced>[_a-zA-Z][a-zA-Z0-9]*(?::[_a-zA-Z]\w*)+)
     \}
     )
     |
     (?P<Alternation2>
      \$(?P<named>[_a-zA-Z][a-zA-Z0-9]*(?::[_a-zA-Z]\w*)+
     )
    )
    

    Have a look at the demo. This regex requires the use of x option (and g options on regex101.com to show all matches, in Python, you'd use findall or finditer).

    More information about non-capturing groups is available on SO and at regular-expressions.info.

    To just get all matches in Python, you can use finditer like this:

    import re
    p = re.compile(ur'''(?P<Alternation1>
         \$\{(?P<braced>[_a-zA-Z][a-zA-Z0-9]*(?::[_a-zA-Z]\w*)+)
         \}
         )
         |
         (?P<Alternation2>
          \$(?P<named>[_a-zA-Z][a-zA-Z0-9]*(?::[_a-zA-Z]\w*)+
         )
        )
    ''', re.VERBOSE)
    test_str = u"asdasd $asd:sd + ${asd123:asd} $HOME $$asd"
    
    print [x for x in re.findall(p, test_str)]
    

    See IDEONE demo