Search code examples
phpregexregex-group

regex pattern ending sub group


I could have the following strings: Case 1:

VIR RECU 998721687978
DE: Mrs PAUL SMITH
564
MOTIF: ANY REASON

or case 2:

VIR RECU 998721687978
DE: Mrs PAUL SMITH
564

The "MOTIF: ..." part can be missing from the string

I am looking for a regex to isolate substring of the 2 precedents. So far I have: ^VIR\sRECU\s(\d+)\nDE:\s(.*)(\nMOTIF:\s(.*)) that work for well for the case 1 but not for the case 2. If I had a question mark after the 'motif' capturing group like ^VIR\sRECU\s(\d+)\nDE:\s(.*)(\nMOTIF:\s(.*))? then this group is never isolated

I suppose the problem comes from the (.*) group, but cannot figure how to fix it.

Is it possible ? Or should I have 2 different regex, 1 for each case ?

I am using these regex on php with preg_math() function.

The results I want is 998721687978, Mrs PAUL SMITH 564 and ANY REASON values


Solution

  • You may use

    ^VIR\s+RECU\s+(\d+)\nDE:\s+([\s\S]*?)(\nMOTIF:\s+(.*))?$
    

    See a regex test #1 and a regex test #2

    Regex details

    • ^ - start of string
    • VIR\s+RECU\s+ - VIR, 1+ whitespaces, RECU and again 1+ whitespaces
    • (\d+) - Group 1: one or more digits
    • \nDE: - a newline and DE: substring
    • \s+ - 1+ whitespaces
    • ([\s\S]*?) - Group 2: any 0+ chars, as few as possible
    • (\nMOTIF:\s+(.*))? - an optional capturing group #3:
      • \nMOTIF: - newline and MOTIF: string
      • \s+ - 1+ whitespaces
      • (.*) - Group 4: any 0+ chars other than line break chars, as many as possible
    • $ - end of string.