I want to solve the following problem using regular expressions alone: a multi-line string in which information is separated by Z!
on one end and S0634
at the other, like :
Z! EXT .000 ...HOUSE... L24JN7
PERSONAL COMPUTER\J\039060-L24JN7-000-*****-*****-
Payroll No.: 1
-Name:
-Folios:
-Date: 6/24/2014
-Subformat: S0634
Z! EXT .000 ...HOUSE... L24JN7
PERSONAL COMPUTER\J\039060-L24JN7-000-*****-*****-
Payroll No.: 2
-Name:
-Date: 6/24/2014
-Subformat: S0634
Z! EXT .000 ...HOUSE... L24JN7
PERSONAL COMPUTER\J\039060-L24JN7-000-*****-*****-
Payroll No.: 3
-Name:
-Folios:
-Date: 6/24/2014
-Subformat: S0634
desired content.</li>
I want to capture only groups bounded by mentioned two-character sequences AND contain the word Folios
(one group in the middle does not have it, only 2 groups do).
I know how to split into groups and can also return the group that does not have it (e.g. (Z!\s*EXT(?:(?!-Folios:).)*?S0634)
). However, how to capture groups that do have it eludes me. I am only interested in regular expression single line of code solutions (I know I could disassemble into groups to then check each group).
Use this:
$regex = '~(?sm)Z!(?:(?!S0634).)*?Folios.*?S0634~';
preg_match_all($regex, $yourstring, $matches);
// See all matches
print_r($matches[0]);
In the demo, you can see that the middle group is excluded.
Output:
Array
(
[0] => Z! EXT .000 ...HOUSE... L24JN7
PERSONAL COMPUTER\J9060-L24JN7-000-*****-*****-
Payroll No.: 1
-Name:
-Folios:
-Date: 6/24/2014
-Subformat: S0634
[1] => Z! EXT .000 ...HOUSE... L24JN7
PERSONAL COMPUTER\J9060-L24JN7-000-*****-*****-
Payroll No.: 3
-Name:
-Folios:
-Date: 6/24/2014
-Subformat: S0634
)
Explanation
(?s)
activates DOTALL
mode, allowing the dot to match across lines(?m)
turns on multi-line mode, allowing ^
and $
to match on each lineZ!
matches the starting delimiter(?:(?!S0634).)*?
lazily matches any chars that are not followed by S0634
, up to...Folios
.*?S0634
lazily matches the rest of the string up to the closing delimiterReference