Search code examples
phpdomfile-get-contentssimple-html-dom

Grab price from URL


I'm trying to get a price from any given URL using simple-html-dom. The example code i used, which works well is from here: http://www.sanwebe.com/2013/06/extract-url-content-like-facebook-with-php-and-jquery

//Include PHP HTML DOM parser (requires PHP 5 +)
include_once("Includes/simple_html_dom.inc.php");

//get URL content
$get_content = file_get_html($get_url); 

Getting the title works fine:

//Get Page Title 
        foreach($get_content->find('title') as $element) 
        {
            $page_title = $element->plaintext;
        }

However when trying to read span elements to get a price looking for a currency symbol i'm getting nothing.

    //Get Price
    foreach($get_content->find('span') as $element) 
    {

        $price = $element->plaintext;

        if (strpos($price, '$') !== FALSE)
            {
                $page_price = $price;
            }

        else { $page_price = '0.00';}
    }

Solution

  • this kindof works, unfortunately, DOMDocument is retarded and will sometimes add < script> content to textContent ... and i don't know how to do this with "simple_html_dom", but i think it'd be easy to port ;) (it would surprise me if its any smarter than DOMDocument though, but who knows..)

    Edit: updated the code to work around the < script > tag issue/bug with DOMNode->textContent

    <?php 
    error_reporting(E_ALL);
    $html=file_get_contents("http://rads.stackoverflow.com/amzn/click/B0081IDX84");
    $domd=new DOMDocument();
    @$domd->loadHTML($html);
    $matches=array();
    foreach($domd->getElementsByTagName("script") as $node){
    //DOMDocument is retarded, and will sometimes add <script> content to 
    //textContent, so removing them..
    $node->parentNode->removeChild($node);
    }
    
    
    foreach($domd->getElementsByTagName("span") as $node){
        if(strpos($node->textContent, '$') !==false){
            $matches[]=$node->textContent;
        }
    }
    if(php_sapi_name() === 'cli'){
        var_dump($matches);
        } else {
    echo '<pre>';
    ob_start();
    var_dump($matches);
    echo htmlentities(ob_get_clean());
    echo '</pre>';
    }
    

    you can see the code live in action here http://codepad.viper-7.com/y1b0y3