I have an array containing URLs. I'm trying to get the contents one by one, but sometimes, when a URL is a 404, the file_get_contents()
fails.
function pageContent(String $url): \DOMDocument
{
$html = cache()->rememberForever($url, function () use ($url) {
$opts = [
"http" => [
"method" => "GET",
"header" => "Accept: text/html\r\n"
]
];
try {
$context = stream_context_create($opts);
$file = file_get_contents($url, false, $context);
return $file;
} catch (\Exception $e) {
}
});
$parser = new \DOMDocument();
libxml_use_internal_errors(true);
$parser->loadHTML($html = mb_convert_encoding($html,'HTML-ENTITIES', 'ASCII, JIS, UTF-8, EUC-JP, SJIS'));
return $parser;
}
I tried try catch on it but getting this error. This time loadHTML fails.
DOMDocument::loadHTML(): Empty string supplied as input
You can check whether the $html
is empty before loading it via loadHTML()
:
if(!is_empty($html)) {
$parser->loadHTML($html = mb_convert_encoding($html,'HTML-ENTITIES', 'ASCII, JIS, UTF-8, EUC-JP, SJIS'));
} else {
return null;
}