Having the following input:
Testing 42702434884
Testing 064352729-13
05.994.401/0001-53
Testing 134.632.125-03
I am trying to get lines containing numbers, considering only lines starting with text, and removing the text from the result,
Currently I tried the following expression:
(?!a-zA-Z)\b(\d{11}|\d{14})|(\d{3}\.\d{3}\.\d{3}\-\d{2}|\d{3}\d{3}\d{3}\-\d{2})|(\d{2}\.\d{3}.\d{3}\/\d{4}-\d{2}|\d{2}\d{3}\d{3}\d{4}-\d{2})\b
I was able to remove the text from the result, and find lines containing the patterns, but could not filter only lines starting with text. Here is the result as example
How can I filter lines starting with text while removing the text from the result?
Using the negative lookahead (?!a-zA-Z)\b
with the pattern you tried is always true as what follows is a digit so it can be omitted.
Instead of using the negated lookahead (?!a-zA-Z)
you can use an anchor ^
to assert the start of the string and match 1+ times a char a-zA-Z followed by a space and make it optional (?:[a-zA-Z]+ )?
if you want to match all the examples
The append a group around all the alternations.
If you don't need all the capturing groups, you could make them non capturing (?:
instead except for the numbers that you want to keep.
The values are in group 1.
^(?:[a-zA-Z]+ )?((?:\d{11}|\d{14})|(?:\d{3}\.\d{3}\.\d{3}\-\d{2}|\d{3}\d{3}\d{3}\-\d{2})|(?:\d{2}\.\d{3}.\d{3}\/\d{4}-\d{2}|\d{2}\d{3}\d{3}\d{4}-\d{2})\b)
Note for Java to double escape the backslashes.
To get only 3 matches, you could use
^[a-zA-Z]+ ((?:\d{11}|\d{14})|(?:\d{3}\.\d{3}\.\d{3}\-\d{2}|\d{3}\d{3}\d{3}\-\d{2})|(?:\d{2}\.\d{3}.\d{3}\/\d{4}-\d{2}|\d{2}\d{3}\d{3}\d{4}-\d{2})\b)