This is my string : file_1234_test.pdf
Task is to find the filename-without-extension and find the number.
So the result should be :
> Match 1 = file_1234_test.pdf
> Group 1 = file_1234_test
> Group 2 = 1234
I found Stack-58379142 but it does not answer my question.
I tested the following queries on regex101 and regexstorm
Step 1. as expected
> (.*)\.pdf
> Match 1 = file_1234_test.pdf
> Group 1 = file_1234_test
Step 2. as expected : greedy '+' quantifier
> (\d+)
> Match 1 = 1234
> Group 1 = 1234
Step 3. still as expected
> ((\d+).*)
> Match 1 = 1234_test.pdf
> Group 1 = 1234_test.pdf
> Group 2 = 1234
Step 4. once again as expected
> ((\d+).*)\.pdf
> Match 1 = 1234_test.pdf
> Group 1 = 1234_test
> Group 2 = 1234
Step 5. '+' quantifier suddenly became lazy
> (.*(\d+).*)\.pdf
> Match 1 = file_1234_test.pdf
> Group 1 = file_1234_test
> Group 2 = 4
Of course (.*(\d{4}).*)\.pdf
or (.*_(\d+).*)\.pdf
works.
> Match 1 = file_1234_test.pdf
> Group 1 = file_1234_test
> Group 2 = 1234
But then the query is (as I feel it) needless narrowing and too specific. What if I have a list of hundreds and ...
So, Question : Is there a solution ?
You could try this regex pattern: (.*?(\d+).*)\.pdf
It makes the first part .*?
become lazy matching.
See demo here