Search code examples
phphtmlxpathdomdocument

Web scraping with Xpath, grabbing img


I am trying to scrape some img from page. But couldn't grab those. My path is true(i think) but Xpath returns 0. Any idea what is wrong with my path?

function pageContent($url)
{

    $html = cache()->rememberForever($url, function () use ($url) {
        return file_get_contents($url);
    });

    $parser = new \DOMDocument();
    $parser->loadHTML($html);
    return $parser;

}

$url = 'https://sumai.tokyu-land.co.jp/osaka';

@$parser = pageContent($url);

$resimler = [];
$rota = new \DOMXPath($parser);
$images = $rota->query("//section//div[@class='p-articlelist-content-left']//div[@class='p-articlelist-content-img']//img");


foreach ($images as $image) {
    $resimler[] = $image->getAttribute("src");
}

var_dump($resimler);

Solution

  • You were looking for a div[@class='p-articlelist-content-img'] instead of a ul.

    In addition to that, you should not be hiding error messages with the @ operator, instead use the libxml_use_internal_errors() function as it was intended.

    Finally, the // search in XPath is expensive, so avoid it where possible, and you can get the attribute value directly from the query (I don't know if this is any more efficient though.)

    function pageContent(String $url) : \DOMDocument
    {
        $html = cache()->rememberForever($url, function () use ($url) {
            return file_get_contents($url);
        });
        $parser = new \DOMDocument();
        libxml_use_internal_errors(true);
        $parser->loadHTML($html);
        libxml_use_internal_errors(false);
        return $parser;
    }
    
    $url    = "https://sumai.tokyu-land.co.jp/osaka";
    $parser = pageContent($url);
    $rota   = new \DOMXPath($parser);
    $images = $rota->query("//ul[@class='p-articlelist-content-img']/li/img/@src");
    
    foreach ($images as $image) {
        $resimler[] = $image->nodeValue;
    }
    
    var_dump($resimler);