Search code examples
regexpcreregex-groupregex-greedy

How to capture a filename with or without an extension


I'm trying to capture and replace a filename like 000035 ZSMS_1.mp3 but also a file like 000035 1OMNA (I'm basically trying to reorder them so they look like e.g., ZSMS_1(000035).mp3). I've tried

^(\d+) (.*)(\..*$)?
^(\d+) (.*?)(\..*$)?

What I expect to happen: 000035 ZSMS_1.mp3:

[
  {
    "groups": [
        "000035",
        "ZSMS_1",
        ".mp3"
      ],
    "match": "000035 ZSMS_1.mp3"
  }
]

000035 1OMNA:

[
  {
    "groups": [
        "000035",
        "1OMNA",
        ""
      ],
    "match": "000035 1OMNA"
  }
]

What happens: 1.

^(\d+) (.*)(\..*$)?

000035 ZSMS_1.mp3:

[
  {
    "groups": [
        "000035",
        "ZSMS_1.mp3",
        ""
      ],
    "match": "000035 ZSMS_1.mp3"
  }
]

000035 1OMNA:

[
  {
    "groups": [
        "000035",
        "1OMNA",
        ""
      ],
    "match": "000035 1OMNA"
  }
]
^(\d+) (.*?)(\..*$)?

000035 ZSMS_1.mp3:

[
  {
    "groups": [
        "000035",
        "",
        ""
      ],
    "match": "000035 "
  }
]

000035 1OMNA:

[
  {
    "groups": [
        "000035",
        "",
        ""
      ],
    "match": "000035 "
  }
]

Solution

  • You may use

    ^(\d+)\h+(.*?)(\.[^.]*)?$
    

    See the regex demo

    Details

    • ^ - start of string
    • (\d+) - Group 1: one or more digits
    • \h+ - 1+ horizontal whitespaces (for better regex engine cross-compatibility, you may use [^\S\r\n]+ or just [ \t]+ to match a tab or space)
    • (.*?) - Group 2: zero or more chars other than linebreak chars, as few as possible
    • (\.[^.]*)? - an optional capturing group #3: a dot and then 0 or more chars other than . as many as possible
    • $ - end of string.