Search code examples
regexgrepgroupingquantifiers

RegEx : Nested Groups and Quantifiers


This is my string : file_1234_test.pdf
Task is to find the filename-without-extension and find the number.
So the result should be :

> Match 1 = file_1234_test.pdf
> Group 1 = file_1234_test
> Group 2 = 1234

I found Stack-58379142 but it does not answer my question.

I tested the following queries on regex101 and regexstorm

Step 1. as expected

> (.*)\.pdf
> Match 1 = file_1234_test.pdf
> Group 1 = file_1234_test

Step 2. as expected : greedy '+' quantifier

> (\d+)
> Match 1 = 1234
> Group 1 = 1234

Step 3. still as expected

> ((\d+).*)
> Match 1 = 1234_test.pdf
> Group 1 = 1234_test.pdf
> Group 2 = 1234

Step 4. once again as expected

> ((\d+).*)\.pdf
> Match 1 = 1234_test.pdf
> Group 1 = 1234_test
> Group 2 = 1234

Step 5. '+' quantifier suddenly became lazy

> (.*(\d+).*)\.pdf
> Match 1 = file_1234_test.pdf
> Group 1 = file_1234_test
> Group 2 = 4

Of course (.*(\d{4}).*)\.pdf or (.*_(\d+).*)\.pdf works.

> Match 1 = file_1234_test.pdf
> Group 1 = file_1234_test
> Group 2 = 1234

But then the query is (as I feel it) needless narrowing and too specific. What if I have a list of hundreds and ...

So, Question : Is there a solution ?


Solution

  • You could try this regex pattern: (.*?(\d+).*)\.pdf

    It makes the first part .*? become lazy matching.

    See demo here