I see a lot of answers on SO that pertain to the question but either there are slight differences that I couldn't overcome or maybe i just couldn't repeat the processes shown.
What I am trying to accomplish is to use CURL to get the HTML from a Google+ business page, iterate over the HTML and for each review of the business scrape the reviews HTML for display on that businesses non google+ webpage.
Every review shares this parent div structure:
<div class="ZWa nAa" guidedhelpid="userreviews"> .....
Thus i am trying to do a foreach loop based on finding and grabbing the div and innerhtml for each div with attribute: guidehelpid="userreviews"
I am succesfully getting the HTML back via CURL and can parse it when targeting a standard TAG name like "a" or if it had an ID, but iterating over the HTML using the PHP default parser when looking for a attribute name is problematic:
How can I take this successful code below and make it work like intended as shown in the second code which of course is wrong?
WORKING CODE (Finds,gets, echo's all "a" tags in $output)
$url = "https://plus.google.com/+Mcgowansac/about";
$curl = curl_init($url);
curl_setopt($curl, CURLOPT_RETURNTRANSFER, TRUE);
$output = curl_exec($curl);
curl_close($curl);
$DOM = new DOMDocument;
@$DOM->loadHTML($output);
foreach($DOM->getElementsByTagName('a') as $link) {
# Show the <a href>
echo $link->getAttribute('href');
echo "<br />";}
THEORETICALLY NEEDED CODE: (Find every review by custom attribute in HTML and echo them)
$url = "https://plus.google.com/+Mcgowansac/about";
$curl = curl_init($url);
curl_setopt($curl, CURLOPT_RETURNTRANSFER, TRUE);
$output = curl_exec($curl);
curl_close($curl);
$DOM = new DOMDocument;
@$DOM->loadHTML($output);
foreach($DOM->getElementsByTagName('div[guidehelpid=userreviews]') as $review) {
echo $review;
echo "<br />"; }
Any help i correcting this would be appreciated. I would prefer not to use "simple_html_dom" if I can accomplish this without it.
I suggest and you could use an DOMXpath
in this case too. Example:
$url = "https://plus.google.com/+Mcgowansac/about";
$curl = curl_init($url);
curl_setopt($curl, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.13) Gecko/20080311 Firefox/2.0.0.13');
curl_setopt($curl, CURLOPT_RETURNTRANSFER, TRUE);
$output = curl_exec($curl);
curl_close($curl);
$dom = new DOMDocument;
libxml_use_internal_errors(true);
$dom->loadHTML($output);
libxml_clear_errors();
$xpath = new DOMXpath($dom);
$review = $xpath->query('//div[@guidedhelpid="userreviews"]');
if($review->length > 0) { // if it exists
echo $review->item(0)->nodeValue;
// echoes
// John DeRemer reviewed 3 months ago Last fall, we had a major issue with mold which required major ... and so on
}