I'm trying to grab the href value of an element using PHP, but I'm having some trouble. Here's a snippet of my code.
<?php
ini_set("log_errors", 1);
ini_set("error_log", "php-error.log");
$target_url = "http://foo.bar";
$request = $target_url;
$html = $this->scraper($request);
$dom = new DOMDocument();
$dom->loadHTML($html);
// Error point - $dom is empty
error_log("dom:");
error_log($dom);
$xpath = new DOMXPath($dom);
error_log("setting target url");
$target_url = $xpath->query("//*[@class='foo_bar']/href");
?>
Logging $html results in the standard, full HTML output of the page. A search shows that my xpath should work. However, when I try to log $dom after loadHTML, I get a blank result. I've been struggling for a few hours trying to work out why, but with no luck.
Does anyone have any ideas/anything I could try?
Edited to add console output:
[30-Sep-2015 13:51:59 America/New_York] dom:
[30-Sep-2015 13:51:59 America/New_York] setting target url
You should check that the HTML was loaded into the DOM. You can use a debugger, the logging or var_dump() for that.
var_dump($dom->saveXml());
If its wasn't loaded into DOM take a step back and validate that the HTML was fetched by the scraper.
var_dump($html);
If the HTML was loaded into the DOM you will still need to fix the Xpath. I would expect href
being an attribute node.
//*[@class='foo_bar']/@href
You seem to want to read it as a string value, so cast it:
string(//*[@class='foo_bar']/@href)
That only works with DOMXpath::evaluate()
, DOMXpath::query()
can only return node lists.
$target_url = $xpath->evaluate("string(//*[@class='foo_bar']/@href)");
A small example:
$document = new DOMDocument();
$document->loadHtml('<a href="http://example.com">Example</a>');
$xpath = new DOMXpath($document);
var_dump($xpath->evaluate('string(//a[1]/@href)'));
Output:
string(18) "http://example.com"