I'm running a scraper on localhost and am having trouble scraping a 2.50MB html file that's stored on a website directory on my computer.
Right now I have
error_reporting(E_ALL);
ini_set('display_errors', '1');
require_once 'simplehtmldom-2rc2/HtmlWeb.php';
use simplehtmldom\HtmlWeb;
$doc = new HtmlWeb();
$html = $doc->load('http://localhost/onetab/test.html');
I have a file called test.html
that when I add 1 more character to it, my scraper fails to fetch the file.
Given the memory limit and memory usage stated above, how can adding one extra character to test.html
cause the ->load
function to fail so $html
is blank (or null)?
I'm using Simple HTML Dom version 2 RC2.
Using the following lines does not help.
set_time_limit(0); // 0 is infinite, or it could be 5000
ini_set('max_input_time', 5000 );
ini_set('max_execution_time', 5000 );
ini_set('max_input_vars', 5000 );
ini_set('max_input_nesting_level', 5000 );
In the Simple HTML DOM version 2 RC2 library there is a constants.php
file with some settings to change. In it the MAX_FILE_SIZE
constant (a type of variable) has to be changed.
To make it accept a 9MB file I set the value to 1024 * 1024 * 9
. You could just change the value to be the number or numerical sum you want, or you might even want to make it a variable like
$chosenvalue = 1024 * 1024 * 9; //9mb file (bytes --> kilobytes --> megabytes)
These instructions are mentioned in the manual/api/constants.md
file. But because the library is still a release candidate while waiting for a final stable version release, the documentation is not written fully in a clear manner as an offline file. You can read the relevant documentation web page online.