Search code examples
phpregexhtml-parsing

Wrap HTML Tag around strings in HTML code


I'm using PHP 7.3+. I want to wrap different HTML tags around each letter in my HTML code, but only around text, not other HTML tags. For example:

<!doctype html>

<html lang="en">
<head>
  <meta charset="utf-8">

  <title>The HTML5 Herald</title>
  <meta name="description" content="The HTML5 Herald">
  <meta name="author" content="SitePoint">

  <link rel="stylesheet" href="css/styles.css?v=1.0">

</head>

<body>
  <h1>This is the first heading</h1>
  <p>Some Paragraph</p>
  <ul>
    <li>List item 1</li>
    <li>List item 2</li>
  </ul>
  <script src="js/scripts.js"></script>
</body>
</html>

I basically want to wrap <span class="customclass"></span> around each letter of the website text. Which are:

This is the first heading
Some Paragraph
List item 1
List item 2

Here is the expected output:

 <!doctype html>

<html lang="en">
<head>
  <meta charset="utf-8">

  <title>The HTML5 Herald</title>
  <meta name="description" content="The HTML5 Herald">
  <meta name="author" content="SitePoint">

  <link rel="stylesheet" href="css/styles.css?v=1.0">

</head>

<body>
  <h1><span class="customclass">T</span><span class="customclass">h</span><span class="customclass">i</span><span class="customclass">s</span> <span class="customclass">i</span><span class="customclass">s</span> <span class="customclass">t</span><span class="customclass">h</span><span class="customclass">e</span> <span class="customclass">f</span><span class="customclass">i</span><span class="customclass">r</span><span class="customclass">s</span><span class="customclass">t</span> <span class="customclass">h</span><span class="customclass">e</span><span class="customclass">a</span><span class="customclass">d</span><span class="customclass">i</span><span class="customclass">n</span><span class="customclass">g</span></h1>
  <p><span class="customclass">S</span><span class="customclass">o</span><span class="customclass">m</span><span class="customclass">e</span> <span class="customclass">P</span><span class="customclass">a</span><span class="customclass">r</span><span class="customclass">a</span><span class="customclass">g</span><span class="customclass">r</span><span class="customclass">a</span><span class="customclass">p</span><span class="customclass">h</span></p>
  <ul>
    <li><span class="customclass">L</span><span class="customclass">i</span><span class="customclass">s</span><span class="customclass">t</span> <span class="customclass">i</span><span class="customclass">t</span><span class="customclass">e</span><span class="customclass">m</span> <span class="customclass">1</span></li>
    <li><span class="customclass">L</span><span class="customclass">i</span><span class="customclass">s</span><span class="customclass">t</span> <span class="customclass">i</span><span class="customclass">t</span><span class="customclass">e</span><span class="customclass">m</span> <span class="customclass">2</span></li>
  </ul>
  <script src="js/scripts.js"></script>
</body>
</html>

I have got this regex, but it's only wrapping it around the 2 last words and what I need is each letters:

preg_replace('/\b[\w\'-]+\W+\K.*\S/s', '<span class="customclass">$0</span>', $html);

How is this possible with PHP?


Solution

  • This code grabs the body of the HTML document and then recursively looks for elements and the text within those elements. If it find any, it wraps each letter in a span tag.

    This supports custom tags and only affect the HTML in the <body> of the document.

    $string ='<!doctype html>
    <html lang="en">
    <head>
      <meta charset="utf-8">
      <title>The HTML5 Herald</title>
      <meta name="description" content="The HTML5 Herald">
      <meta name="author" content="SitePoint">
      <link rel="stylesheet" href="css/styles.css?v=1.0">
    </head>
    <body>
      <h1>This is the first heading</h1>
      <p>Some Paragraph</p>
      <ul>
        <li>List item 1</li>
        <li>List item 2</li>
      </ul>
      <script src="js/scripts.js"></script>
    </body>
    </html>';
    
    
    $previous_value = libxml_use_internal_errors(TRUE);
    $dom = new DOMDocument();
    $dom->loadHTML($string);
    $body = $dom->getElementsByTagName('body')->item(0);
    
    foreach ($body->getElementsByTagName('*') as $element) {
        replaceText($element);
    }
    
    $html = html_entity_decode($dom->saveHTML());
    
    libxml_clear_errors();
    libxml_use_internal_errors($previous_value);
    
    function replaceText(DOMNode $node) {
        if ($node instanceof DOMText) {
            $newString = '';
            foreach(str_split($node->nodeValue) as $char) {
                if (!trim($char)) {
                    continue;
                }
                $newString .= sprintf('<span class="customclass">%s</span>', $char);
            }
            $node->nodeValue = $newString;
            return $node;
        }
        if ($node->hasChildNodes()) {
            foreach($node->childNodes as $childNode) {
                return replaceText($childNode);
            }
        }
    }
    

    Output:

    <!DOCTYPE html>
    <html lang="en">
    <head>
      <meta charset="utf-8">
      <title>The HTML5 Herald</title>
      <meta name="description" content="The HTML5 Herald">
      <meta name="author" content="SitePoint">
      <link rel="stylesheet" href="css/styles.css?v=1.0">
    </head>
    <body>
      <h1><span class="customclass">T</span><span class="customclass">h</span><span class="customclass">i</span><span class="customclass">s</span><span class="customclass">i</span><span class="customclass">s</span><span class="customclass">t</span><span class="customclass">h</span><span class="customclass">e</span><span class="customclass">f</span><span class="customclass">i</span><span class="customclass">r</span><span class="customclass">s</span><span class="customclass">t</span><span class="customclass">h</span><span class="customclass">e</span><span class="customclass">a</span><span class="customclass">d</span><span class="customclass">i</span><span class="customclass">n</span><span class="customclass">g</span></h1>
      <p><span class="customclass">S</span><span class="customclass">o</span><span class="customclass">m</span><span class="customclass">e</span><span class="customclass">P</span><span class="customclass">a</span><span class="customclass">r</span><span class="customclass">a</span><span class="customclass">g</span><span class="customclass">r</span><span class="customclass">a</span><span class="customclass">p</span><span class="customclass">h</span></p>
      <ul>
      <li><span class="customclass">L</span><span class="customclass">i</span><span class="customclass">s</span><span class="customclass">t</span><span class="customclass">i</span><span class="customclass">t</span><span class="customclass">e</span><span class="customclass">m</span><span class="customclass">1</span></li>
        <li><span class="customclass">L</span><span class="customclass">i</span><span class="customclass">s</span><span class="customclass">t</span><span class="customclass">i</span><span class="customclass">t</span><span class="customclass">e</span><span class="customclass">m</span><span class="customclass">2</span></li>
      </ul>
      <script src="js/scripts.js"></script>
    </body>
    </html>