Search code examples
phpregexhtmlspecialchars

Covert HTML tags between <code></code> to special characters


I'm writing a blogging web app using PHP, I would like the blog to be able to show some code snippets to the audiences

I publish new articles to the blog using markdown files, so after converting from markdown to HTML the result should look something like this

<h3>This is sample HTML Coding</h3>

<pre><code>
    <html>
        <body>
            Hello World
        </body>
    </html>
</code></pre>

<h3>This is another sample HTML Coding</h3>

<pre><code>
    <html>
        <body>
            Another Hello World
        </body>
    </html>
</code></pre>

I cannot just simply use PHP function like htmlspecialchars() because I need those e.g. headings and code blocks to be rendered. I should in fact need to convert only all special characters between <code></code>

Right now the only thing I can think of is using RegEx, I came up with 2 choices of direction which I think it might be possible.

  1. Match all <, >, </ only between <code></code> and use preg_replace() with special characters on each of those.

  2. Match all characters between each <code></code> (because I would have several code blocks on each article) then use preg_replace() with htmlspecialchars()

Please advise

  1. Which choice should I use?
  2. What is RegEx to do the job?

P.S.

I put the HTML result from markdown at regex101.com I tried some RegEx e.g. (?<=<code>)[<](?=<\/code>)/g for choice 1, and (?<=<code>)[\s\S]*(?=<\/code>)/g for choice 2, but they both are not working.

Edited

This is expected result I wish.

    <h3>This is sample HTML Coding</h3>

    <pre><code>
        &lt;html>
            &lt;body&gt;
                Hello World
            &lt;/body&gt;
        &lt;/html&gt;
    </code></pre>

    <h3>This is another sample HTML Coding</h3>

    <pre><code>
        &lt;html&gt;
            &lt;body&gt;
                Another Hello World
            &lt;/body&gt;
        &lt;/html&gt;
    </code></pre>

Solution

  • It's not clear to me why you're wanting to do this, but you should use a callback function here:

    $html = preg_replace_callback('~(?<=<code>).*?(?=</code>)~s', 
          function($m) {
             return htmlentities($m[0]);
          }, $html);
    

    Working Demo