Search code examples
regexpcreautoitregex-lookarounds

Capture groups bound by words AND containing certain words


I want to solve the following problem using regular expressions alone: a multi-line string in which information is separated by Z! on one end and S0634 at the other, like :

Z! EXT .000 ...HOUSE... L24JN7   
PERSONAL COMPUTER\J\039060-L24JN7-000-*****-*****-
Payroll No.: 1
 -Name: 
 -Folios: 
 -Date: 6/24/2014
 -Subformat: S0634
Z! EXT .000 ...HOUSE... L24JN7   
PERSONAL COMPUTER\J\039060-L24JN7-000-*****-*****-
Payroll No.: 2
 -Name:  
 -Date: 6/24/2014
 -Subformat: S0634
Z! EXT .000 ...HOUSE... L24JN7   
PERSONAL COMPUTER\J\039060-L24JN7-000-*****-*****-
Payroll No.: 3
 -Name: 
 -Folios: 
 -Date: 6/24/2014
 -Subformat: S0634
desired content.</li>

I want to capture only groups bounded by mentioned two-character sequences AND contain the word Folios (one group in the middle does not have it, only 2 groups do).

I know how to split into groups and can also return the group that does not have it (e.g. (Z!\s*EXT(?:(?!-Folios:).)*?S0634)). However, how to capture groups that do have it eludes me. I am only interested in regular expression single line of code solutions (I know I could disassemble into groups to then check each group).


Solution

  • Use this:

    $regex = '~(?sm)Z!(?:(?!S0634).)*?Folios.*?S0634~';
    preg_match_all($regex, $yourstring, $matches);
    // See all matches
    print_r($matches[0]);
    

    In the demo, you can see that the middle group is excluded.

    Output:

    Array
    (
        [0] => Z! EXT .000 ...HOUSE... L24JN7   
    PERSONAL COMPUTER\J9060-L24JN7-000-*****-*****-
    Payroll No.: 1
     -Name: 
     -Folios: 
     -Date: 6/24/2014
     -Subformat: S0634  
    
        [1] => Z! EXT .000 ...HOUSE... L24JN7   
    PERSONAL COMPUTER\J9060-L24JN7-000-*****-*****-
    Payroll No.: 3
     -Name: 
     -Folios: 
     -Date: 6/24/2014
     -Subformat: S0634
    )
    

    Explanation

    • (?s) activates DOTALL mode, allowing the dot to match across lines
    • (?m) turns on multi-line mode, allowing ^ and $ to match on each line
    • Z! matches the starting delimiter
    • (?:(?!S0634).)*? lazily matches any chars that are not followed by S0634, up to...
    • Folios
    • .*?S0634 lazily matches the rest of the string up to the closing delimiter

    Reference