Search code examples
phpregexlaravelpreg-replace

Regex : Remove all comments from html file BUT preserve same number of lines


If a comment in a file covers 6 of it's lines, the comment should be removed and replaced with empty lines which equal the comment's number of lines.

Here is a small demonstrations of what i mean. Given file.html has 10 lines :

    line 1 : <!-- text
    line 2 :      text
    line 3 :      text
    line 4 :      empty line
    line 5 :      text
    line 6 : -->
    line 7 :empty line
    line 8 :text
    line 9 :empty line
    line 10 :text

The expected output would be :

    line 1 :empty line
    line 2 :empty line
    line 3 :empty line
    line 4 :empty line
    line 5 :empty line
    line 6 :empty line
    line 7 :empty line
    line 8 :text
    line 9 :empty line
    line 10 :text

The pattern i am currently using preg_replace('/(?=<!--)([\s\S]*?)-->/', '', $contents); replaces the content of the file with empty string which doesnt not preserve the same number of lines that the file previously had.

Note that any solution needs to keep the structure of the file as it was such that the text on line 8 and 10 don't change position within the file.

Edit : no idea why this was flagged as duplicate. In no way is it similar to the supposed duplicated question given how that one wants to generally know how one can go about parsing the dom as opposed to my very specific and centered question about removing commented text within a file without altering the number of lines in that file.


Solution

  • You may use this search for searching:

    (?:^\h*<!--|(?<!\A|-->\n)\G).*\R
    

    and replace that with a "\n"

    RegEx Demo

    RegEx Details:

    • (?:: Start non-capture group
      • ^: Start of a line
      • \h*<!--: Match 0 or more whitespaces followed by <!--
      • |: OR
      • (?<!\A|-->\n): Negative lookbehind to avoid match if we have either start position or we have --> + line break at previous position
      • \G: Match end position of previous match
    • ): End non-capture group
    • .*\R: Match remaining characters in line followed by line break