I am trying to parse a Wikipedia page - and for some reason below code works for all Wikipedia pages (except the Apple Wikipedia page!!!)
include ('simple_html_dom.php');
$url = "http://en.wikipedia.org/wiki/Apple_Inc.";
$html = file_get_html($url);
Strlen for $html above returns 0 above for Apple.
Note: the above code works perfectly fine when $url is set to other Wikipedia pages for Microsoft - http://en.wikipedia.org/wiki/Microsoft - for Diageo - http://en.wikipedia.org/wiki/Diageo, etc
I want to use file_get_html - so that i can get it into a DOM object and process it further.
Change MAX_FILE_SIZE
constant in simple_html_dom.php to, e.g.
define('MAX_FILE_SIZE', 800000);
and you are good to go... :) This is way you got '0' in case of apple page. Strlen is above limit...
if (empty($contents) || strlen($contents) > MAX_FILE_SIZE)
{
return false;
}