Search code examples
pythonregexsyntax

Getting captured group in one line


There is a known "pattern" to get the captured group value or an empty string if no match:

match = re.search('regex', 'text')
if match:
    value = match.group(1)
else:
    value = ""

or:

match = re.search('regex', 'text')
value = match.group(1) if match else ''

Is there a simple and pythonic way to do this in one line?

In other words, can I provide a default for a capturing group in case it's not found?


For example, I need to extract all alphanumeric characters (and _) from the text after the key= string:

>>> import re
>>> PATTERN = re.compile('key=(\w+)')
>>> def find_text(text):
...     match = PATTERN.search(text)
...     return match.group(1) if match else ''
... 
>>> find_text('foo=bar,key=value,beer=pub')
'value'
>>> find_text('no match here')
''

Is it possible for find_text() to be a one-liner?

It is just an example, I'm looking for a generic approach.


Solution

  • Quoting from the MatchObjects docs,

    Match objects always have a boolean value of True. Since match() and search() return None when there is no match, you can test whether there was a match with a simple if statement:

    match = re.search(pattern, string)
    if match:
       process(match)
    

    Since there is no other option, and as you use a function, I would like to present this alternative

    def find_text(text, matches = lambda x: x.group(1) if x else ''):
        return matches(PATTERN.search(text))
    
    assert find_text('foo=bar,key=value,beer=pub') == 'value'
    assert find_text('no match here') == ''
    

    It is the same exact thing, but only the check which you need to do has been default parameterized.

    Thinking of @Kevin's solution and @devnull's suggestions in the comments, you can do something like this

    def find_text(text):
        return next((item.group(1) for item in PATTERN.finditer(text)), "")
    

    This takes advantage of the fact that, next accepts the default to be returned as an argument. But this has the overhead of creating a generator expression on every iteration. So, I would stick to the first version.