Search code examples
regex

Extracting an ID from a filename that is optional and happens before a special character


I have a regex that parses filenames extracting an optional ID and a product name. It only works partially and I wonder why that is the case and how I would fix this.

The files look like this (simplified):

  • [email protected] ... 12 is the ID, product1 the product name
  • [email protected] ... no ID available product2 is the product name
  • product3.txt ... no @ as separator ... product3 is the product name

So there is garbage (abc...def) in front of the @ sign with an optional ID. In reality the ID is more complicated (not just numbers) but has a fixed format with a fixed length. The complete part with the @ is optional as well.

This is regex that nearly works:

^(.*?(?<id>\d{2}).*?@)?(?<product>.*)\.\w+$

It works for case 1 and 3. As soon as I add another ? for the ID to also match case 2 the first case stops working.

Regex I thought would work:

^(.*?(?<id>\d{2})?.*?@)?(?<product>.*)\.\w+$

This extracts the ID, but it must be present

Does not extract the ID

Can anyone explain to my why the second regex does not exract the ID and what I can do to fix it?

Thanks!!


Solution

  • /^(.*?(?<id>\d{2}?).*?@)?(?<product>.*)\.\w+$/gm
    

    you need to add the ? non-greedy inside the capturing group the reason being

    .*?(?<id>\d{2})? - if you use it outside the capturing group it matches the previous token i.e your ID along with `.*` in front of capturing group i.e abc
    
    .*?(?<id>\d{2}?) - here it will match previous token i.e only your 2 digit ID