Search code examples
phpsymfonydomcrawler

Symfony Dom Crawler missing closing tag in template


I use the Symfony DOM Crawler to read and save an HTML document containing a template. But the closing HTML tags are missing in the template. Here is an example:

<?php

$htmlString = <<<'HTML'
<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <title>Title</title>
</head>
<body>
<h1>Title</h1>
<script id="my-template" type="text/template">
    <div>{{ Name }}</div>
</script>
</body>
HTML;

$crawler = new Crawler($htmlString);

$output = join(
    $crawler->filterXPath('//body')->each(
        function (Crawler $node, $i) use ($htmlString) {
            return $node->html();
        }
    )
);

I would expect something like:

<h1>Title</h1>
<script id="my-template" type="text/template">
    <p>Hello</p>
    <div>{{ Name }}</div>
</script>

But I get:

<h1>Title</h1>
<script id="my-template" type="text/template">
    <p>Hello
    <div>{{ Name }}
</script>

Do you have an any idea why is the DOM Crawler omitting the closing tag?


Solution

  • I've done some debugging and isolated this issue with following code (as Crawler utilizes DOMElement objects):

    $htmlString = <<<'HTML'
        <script id="my-template" type="text/template">
            <div> Name </div>;      
        </script>
    HTML;
    
    $el = new \DOMDocument();
    libxml_use_internal_errors(true);
    $el->loadHTML($htmlString);
    echo $el->saveHTML($el);
    

    Ouputs (doctype, html and head added automatically, but it's not important here):

      <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
    <html><head><script id="my-template" type="text/template">
                <div> Name ;        
            </script></head></html>
    

    As you can see it gives similar issue with closing tag inside script.

    If you comment out libxml_use_internal_errors(true); then you'll get an error:

    DOMDocument::loadHTML(): Unexpected end tag : div in Entity, line: 2

    I've also did some research about this error and found out that it's pretty old bug in LibXML2 library and not strictly PHP issue:

    https://bugs.php.net/bug.php?id=52012

    I'm getting this issue on PHP 7.0.6, so I guess it's still not fixed at all.

    In general it looks like it's about parsing tag by the libxml library, so you will have to either not use the Crawler, or do not place HTML templates in script tags. Solution may vary on what you're trying to achieve.