Search code examples
regexregex-groupregexp-replace

Regex Replace function: in cases of no match, $1 returns full line instead of null


Test link: regexr.com/42d9c

This has been driving me crazy.

I want to extract barcodes in the lines below:

Ceres Juice Apricot 12 x 1lt unit: 6001240102022
Ceres Juice Cranberry& Kiwi Juice 12 x 1lt... unit:
Ceres Juice Guava 12 x 1lt.. unit:6001240222829
Ceres Juice Orange 12x1lt... unit:
Ceres Juice Medley of Fruits 1L x 12 unit: 6001240100660

It should return:

6001240102022

6001240222829

6001240100660

I correctly use .*(\d{13}).*

And then I use $1 for it to return the first match

But my results look like this:

6001240102022
Ceres Juice Cranberry& Kiwi Juice 12 x 1lt... unit:
6001240222829
Ceres Juice Orange 12x1lt... unit:
6001240100660

Cause:

The cause of this problem is that 'Replace' returns the original string if there is nothing in the match group ($1).

Workaround:

Ensure that there is a 'match' on every single line, and put this into Match Group 1 ($1). Then put your actual match into Match Group 2 ($2). How to do this?

Language/Platform:

Any. I have tried all online Regex websites and also Notepad++


Solution

  • You may add an alternative that matches any string,

    .*(\d{13}).*|.*
    

    The point is that the first alternative is tried first, and if there are 13 consecutive digits on a line, the alternative will "win" and .* won't trigger. $1 will hold the 13 digits then. See the regex demo.

    Alternatively, an optional non-capturing group with the obligatory digit capturing group:

    (?:.*(\d{13}))?.*
    

    See the regex demo

    Here, (?:.*(\d{13}))? will be executed at least once (as ? is a greedy quantifier matching 1 or 0 times) and will find 13 digits and place them into Group 1 after any 0+ chars other than linebreak chars. The .* at the end of the pattern will match the rest of the line.