Search code examples
phpregexpcre

Multi line negative lookahead


I'm not really good with regex (i'm on this one for hours) and I struggle to replace all empty lines between 2 identifier ("{|" and "|}")

My regex look like that (sorry for your eyes) : (\{\|)((?:(?!\|\}).)+)(?:\n\n)((?:(?!\|\}).)+)(\|\})

  • (\{\|) : the character "{|"
  • ((?:(?!\|\}).)+) : Everything if not after "|}" (negative lookahead)
  • (?:\n\n) : The empty line I want to delete
  • ((?:(?!\|\}).)+) : Everything if not after "|}" (negative lookahead)
  • (\|\}) : the character "|}"

Demo

It works, but it delete only the last empty line, can you help me to make it work with all the empty lines ?

I tryed to add a negative lookahead on \n\n with a repeating group on everything but it did not work.


Solution

  • Several ways:

    The \G based pattern: (only one pattern is needed)

    $txt = preg_replace('~ (?: \G (?!\A) | \Q{|\E ) [^|\n]*+ (?s: (?! \Q|}\E | \n\n) . [^|\n]*)*+ \n \K \n+ ~x', '', $txt);
    

    The \G matches the start of the string or the position in the string after the last successful match. This ensures that several matches are contigous.

    What I call a \G based pattern can be schematized like that:

    (?: \G position after a successful match | first match beginning ) reach the target \K target
    

    The "reach the target" part is designed to never match the closing sequence |}. So once the last target is found, the \G part will fail until the first match part succeeds again.

    ~ 
    ### The beginning
    (?:
        \G (?!\A) # contigous to a successful match
      |
        \Q{|\E # opening sequence
               #; note that you can add `[^{]* (*SKIP)` before to quickly avoid 
               #; all failing positions
    
               #; note that if you want to check that the opening sequence is followed by 
               #; a closing sequence (without an other opening sequence), you can do it
               #; here using a lookahead
    )
    
    ### lets reach the target
    #; note that all this part can also be written like that `(?s:(?!\|}|\n\n).)*`
    #; or `(?s:[^|\n]|(?!\|}|\n\n).)*`, but I choosed the unrolled pattern that is
    #; more efficient.
    
    [^|\n]*+ # all that isn't a pipe or a newline
    
    # eventually a character that isn't the start of |} or \n\n
    (?s:   
        (?! \Q|}\E | \n\n ) # negative lookahead
        . # the character
        [^|\n]*
    )*+
    #; adding a `(*SKIP)` here can also be usefull if there's no more empty lines
    #; until the closing sequence
    
    ### The target
    
    \n \K \n+ # the \K is a conveniant way to define the start of the returned match
              # result, this way, only \n+ is replaced (with nothing)
    ~x
    

    or preg_replace_callback: (more simple)

    $txt = preg_replace_callback('~\Q{|\E .*? \Q|}\E~sx', function ($m) {
        return preg_replace('~\n+~', "\n", $m[0]);
    }, $txt);
    

    demos