Say the task were to append the last numbers in a product code to itself with a hyphen between the original and added numbers (purely for experimentation).
I would like to understand why including a space is necessary in the following example:
with foo ( prod )
as ( values ('MYPRODUCT 123'))
select
'dot aster space' as test_type,
'''(.* (\d+))'',''$1-$2''' as the_regex,
regexp_replace(prod,'(.* (\d+))','$1-$2')
from foo
UNION ALL
select
'dot aster no space',
'''(.*(\d+))'',''$1-$2''',
regexp_replace(prod,'(.*(\d+))','$1-$2')
from foo
Result
TEST_TYPE THE_REGEX REGEXP_REPLACE
dot aster space '(.* (\d+))','$1-$2' MYPRODUCT 123-123
dot aster no space '(.*(\d+))','$1-$2' MYPRODUCT 123-3
I would have expected that, since the period matches any character, including a blank space, the two regexes would have the same result.
However, even accepting that they do not, I can't figure out why only the last 3 is captured in the second group.
Thanks.
It's a matter of greediness.
With the regex
'(.* (\d+))'
you ask explicitely for a space before the digits, so \d+ will get the 3 digits.
With the regex
'(.*(\d+))'
the dot .* will take as many characters as it can before matching a digit or more. So .* will match 'MYPRODUCT 12' and \d+ will match '3'.
Solution : the non-greedy quantifier '?'. The regex would be
'(.*?(\d+))'
and it will match a maximum digits for \d+, then the remainder for .*