Test link: regexr.com/42d9c
This has been driving me crazy.
I want to extract barcodes in the lines below:
Ceres Juice Apricot 12 x 1lt unit: 6001240102022
Ceres Juice Cranberry& Kiwi Juice 12 x 1lt... unit:
Ceres Juice Guava 12 x 1lt.. unit:6001240222829
Ceres Juice Orange 12x1lt... unit:
Ceres Juice Medley of Fruits 1L x 12 unit: 6001240100660
It should return:
6001240102022
6001240222829
6001240100660
I correctly use .*(\d{13}).*
And then I use $1
for it to return the first match
But my results look like this:
6001240102022
Ceres Juice Cranberry& Kiwi Juice 12 x 1lt... unit:
6001240222829
Ceres Juice Orange 12x1lt... unit:
6001240100660
Cause:
The cause of this problem is that 'Replace' returns the original string if there is nothing in the match group ($1).
Workaround:
Ensure that there is a 'match' on every single line, and put this into Match Group 1 ($1). Then put your actual match into Match Group 2 ($2). How to do this?
Language/Platform:
Any. I have tried all online Regex websites and also Notepad++
You may add an alternative that matches any string,
.*(\d{13}).*|.*
The point is that the first alternative is tried first, and if there are 13 consecutive digits on a line, the alternative will "win" and .*
won't trigger. $1
will hold the 13 digits then. See the regex demo.
Alternatively, an optional non-capturing group with the obligatory digit capturing group:
(?:.*(\d{13}))?.*
See the regex demo
Here, (?:.*(\d{13}))?
will be executed at least once (as ?
is a greedy quantifier matching 1 or 0 times) and will find 13 digits and place them into Group 1 after any 0+ chars other than linebreak chars. The .*
at the end of the pattern will match the rest of the line.