Search code examples
pythonregexpyparsing

python regex: either or or both with separator


I need a regular expression to match a, b or a;b.

I cannot write a|b|a;b because a and b contain named groups and if I try to do this I get an Exception: redefinition of group name 'a' as group 8; was group 3 at position 60.

a;?b does not work either because ab must not be matched.

How would you solve this? Is this possible with the re library? I have heard there is also a library called pyparsing. Would that be better suited for this problem?


Background: This is a follow up question to this one. Because it does not seem to be possible to pass through color codes in urwid or curses I am trying to decode the color codes I am getting from git so that urwid can reencode these colors.

To avoid problems with copy & paste I am leaving out the leading control character in the following regular expressions:

Working regex, except that it does not match [1m (bold) which is used in a test program:

reo_color_code = re.compile(
    r'\['
    r'((?P<series>[01]);)?'
    r'((?P<fgbg>[34])(?P<color>[0-7]))?'
    r'm'
)

Not compiling regex:

reo_color_code = re.compile(
    r'\['
    r'('
        r'((?P<series>[01]))'
        r'|'
        r'((?P<fgbg>[34])(?P<color>[0-7]))'
        r'|'
        r'((?P<series>[01]));((?P<fgbg>[34])(?P<color>[0-7]))'
    r')'
    r'm'
)

Throws the exception

re.error: redefinition of group name 'series' as group 8; was group 3 at position 60

Solution

  • What I'd do in this case wouldn't be try to build a single regex to solve the entire problem, instead I'd implement a method like the following (also using re but at different levels):

    def get_info(s):
        if s.startswith('[') and s.endswith('m'):
            p = s[1:-1]
            if ';' in p:
                m = re.match('^([01]);([34])([0-7])$', p)
            else:
                m = re.match('^([01])$|^([34])([0-7])$', p)
            if m:
                return tuple(m.groups())
        return None, None, None
    

    You can use it like:

    >>> serie, fgbg, color = get_info('[1;37m')
    >>> serie, fgbg, color
    ('1', '3', '7')
    

    PS: Didn't do too many tests. Hope it helps.