I could have the following strings: Case 1:
VIR RECU 998721687978
DE: Mrs PAUL SMITH
564
MOTIF: ANY REASON
or case 2:
VIR RECU 998721687978
DE: Mrs PAUL SMITH
564
The "MOTIF: ..." part can be missing from the string
I am looking for a regex to isolate substring of the 2 precedents.
So far I have: ^VIR\sRECU\s(\d+)\nDE:\s(.*)(\nMOTIF:\s(.*))
that work for well for the case 1 but not for the case 2. If I had a question mark after the 'motif' capturing group like ^VIR\sRECU\s(\d+)\nDE:\s(.*)(\nMOTIF:\s(.*))?
then this group is never isolated
I suppose the problem comes from the (.*) group, but cannot figure how to fix it.
Is it possible ? Or should I have 2 different regex, 1 for each case ?
I am using these regex on php with preg_math() function.
The results I want is 998721687978
, Mrs PAUL SMITH
564
and ANY REASON
values
You may use
^VIR\s+RECU\s+(\d+)\nDE:\s+([\s\S]*?)(\nMOTIF:\s+(.*))?$
See a regex test #1 and a regex test #2
Regex details
^
- start of stringVIR\s+RECU\s+
- VIR
, 1+ whitespaces, RECU
and again 1+ whitespaces(\d+)
- Group 1: one or more digits\nDE:
- a newline and DE:
substring\s+
- 1+ whitespaces([\s\S]*?)
- Group 2: any 0+ chars, as few as possible(\nMOTIF:\s+(.*))?
- an optional capturing group #3:
\nMOTIF:
- newline and MOTIF:
string\s+
- 1+ whitespaces(.*)
- Group 4: any 0+ chars other than line break chars, as many as possible$
- end of string.