Search code examples
phpsimple-html-dom

Simplehtmldom - limit content size for get_html?


I'm using simplehtmldom to get the title of some links and wondering if I can limit the size of the downloaded content? Instead of downloading the whole content just the first 20 lines of code to get the title.

Right now I'm using this:

  $html = file_get_html($row['current_url']);

  $e = $html->find('title', 0);
  $title = $e->innertext;
  echo $e->innertext . '<br><br>';

thanks


Solution

  • Unless I've missed something, that's not the way file_get_html works. It's going to retrieve the contents of the page.

    In other words, it would have to read the entire page in order to find what it's looking for in the next part.

    Now, if you were to use:

    $section = file_get_contents('http://www.the-URL.com/', NULL, NULL, 0, 444);
    

    You could probably isolate the first 20 lines of html, so long as the page you are getting is always the same from the <!DOCTYPE html> to the </head><body> or <title></title>.

    Then you could grab the first 20 lines, or so, again as long as the amount of Head is the same.

    Then use:

    $html = str_get_html($section);
    

    And then from there use your 'Find'

    $html->find('title', 0);
    


    EDIT:

    include('simple_html_dom.php');
    
    $the_url = 'http://www.the-URL.com/';
    
    // Read 444 characters starting from the 1st character
    $section = file_get_contents($the_url, NULL, NULL, 0, 444);
    $html = str_get_html($section);
    
    if (!$e = $html->find('title', 0)) {
        // Read 444 characters starting from the 445th character
        $section = file_get_contents($the_url, NULL, NULL, 444, 888);
        $html = str_get_html($section);
        $e = $html->find('title', 0);
    }
    
    $title = $e->innertext;
    echo $title . '<br><br>';