Search code examples
regexregex-negationregex-greedy

Replace line breaks except inside <pre> tags with brackets(<>) inside <pre> tags


I replaced all the line breaks outside pre tags using the answer available in the question.

\n(?![^<]*<\/pre>)

It was working fine until the content in pre tag had < or > brackets.

For example, with input of:

<p>Test contennt for regex
with line breaks</p>
<pre>code block 
with multi line content
working fine</pre>
<pre class="brush:C#">
test line break before 
open paranthesis < is not working fine
line breaks after paranthesis
is accepted
</pre>

Output is

<p>Test contennt for regexwith line breaks</p><pre>code block 
with multi line content
working fine</pre><pre class="brush:C#">test line break before open paranthesis < is not working fine
line breaks after paranthesis
is accepted
</pre>

which is not correct - not all line breaks are removed.

See this regex101.


Solution

  • Try this:

    /\n(?=((?!<\/pre).)*?(<pre|$))/sg
    

    The idea is to have a big lookahead. The

    ((?!<\/pre).)*?
    

    repeatedly matches any character (including newlines with the .), and it is followed by

    (<pre|$)
    

    to require that the aforementioned character is not the < in </pre. Then, match either <pre (indicating that the original newline was not inside a <pre, or match the end of the file.

    https://regex101.com/r/cjZQO9/2

    With input of

    <p>Test contennt for regex
    with line breaks</p>
    <pre>code block 
    with multi line content
    working fine</pre>
    text
    more text
    <pre class="brush:C#">
    test line break before 
    open paranthesis < is not working fine
    line breaks after paranthesis
    is accepted
    </pre>
    text
    

    output is

    <p>Test contennt for regexwith line breaks</p><pre>code block 
    with multi line content
    working fine</pre>textmore text<pre class="brush:C#">
    test line break before 
    open paranthesis < is not working fine
    line breaks after paranthesis
    is accepted
    </pre>text