Search code examples
perlweb-scrapingwww-mechanize

WWW::Scripter package using 1.2GB of memory while being used for webscraping


I am not super familiar with this package in the first place. I discovered that the use_plugin('JavaScript') method consumes alot of memory through a profiler. I swapped this method for the plugin('JavaScript'), though the memory consumption was lower, i could not event go through the login page form of the websites i am supposed to scrap.

Globally defined:

my $scripter = WWW::Scripter->new();
$scripter->use_plugin('JavaScript')
if(my $form = $scripter->form_with_fields("Password")){
  $form->value('Password', $conf->{'moxa_p'});
  $form->submit();
}else{
  print "dbg +> form 1.0 not found";
}

Tried using the delete and undef keyword but it does not help at all!


Solution

  • Reduce stack of cached pages (WWW::Scripter WWW::Mechanize)

    Use max_docs in WWW::Scripter or stack_depth in WWW::Mechanize. WWW::Machanize man page recommends setting in to 5 or 10.

    man WWW::Scripter

    max_docs
    The maximum number of document objects to keep in history (along with their corresponding request and response objects). If this is omitted, Mech's stack_depth + 1 will be used. This is off by one because stack_depth is the number of pages you can go back to, so it is one less than the number of recorded pages. max_docs considers 0 to be equivalent to infinity.

    man WWW::Mechanize

    "stack_depth => $value"
    Sets the depth of the page stack that keeps track of all the downloaded pages. Default is effectively infinite stack size. If the stack is eating up your memory, then set this to a smaller number, say 5 or 10. Setting this to zero means Mech will keep no history.