Search code examples
phpdomdocumentfile-get-contentscodeigniter-4

How to get HTML from file_get_content PHP then unminify it


I want to get the HTML content in this page using file_get_contents as string :

https://www.emitennews.com/search/

Then I want to unminify the html code.

So far what I done to unminify it :

$html = file_get_contents("https://www.emitennews.com/search/");                                        
$dom = new \DOMDocument();
$dom->preserveWhiteSpace = false;
$dom->loadHTML($html,LIBXML_HTML_NOIMPLIED);
$dom->formatOutput = true;
print $dom->saveXML($dom->documentElement);

But in the code above I got is error :

DOMDocument::loadHTML(): Tag header invalid in Entity, line: 1

What is the proper way to do it ?


Solution

  • This is the correct code :

    $html = file_get_contents("https://www.emitennews.com/search/");                                        
    $dom = new \DOMDocument();
    libxml_use_internal_errors(true);
    $dom->preserveWhiteSpace = false;
    $dom->loadHTML('<?xml encoding="UTF-8">' . $html,LIBXML_HTML_NOIMPLIED);
    $dom->formatOutput = true;
    print $dom->saveXML($dom->documentElement);
    

    The problem is the site using HTML5. So we need to put :

    libxml_use_internal_errors(true);