I'm trying to scrape some data from metacriti* website using mechanize, but I'm getting no output
Here's my code with a url example:
my $metaURL = "http://www.metacriti*.com/game/pc/dota-2";
my $mech = WWW::Mechanize->new();
$mech->get($metaURL) or die "unable to get $metaURL";
my $tree = HTML::TreeBuilder::XPath->new;
$tree->parse($mech->content);
my @nodes = $tree->findnodes(q{//*[@id="main"]//a[contains(./@href, "user-reviews")]/span[@class="score_value"]});
print $_->string_value, "\n" foreach(@nodes); # text
@nodes
array seems to be empty, my xpath seems good and since i'm using the same syntax in another working script, I really couldnt figure out what is wrong with this one...
Also since this is just the begining, maybe you can suggest me another easy way to scrape/parse websites... If there's any better one :)
Thank you in advance
The HTML seems to be really bad, if you search for $tree->findnodes( '//div[@id="main"]')->[0]->as_HTML
you get a very bare div:
<div class="col main_col" id="main"><div itemscope="itemscope" itemtype="http://schema.org/SoftwareApplication"></div></div>
this indeed does not contain any a
, which explains the result you got.
I tried using tidy
to pretty print the HTML, but it barfed on the file.
If you forget about the div and use q{//a[contains(./@href, "user-reviews")]/span[@class="score_value"]}
you will get a result though, 7.9
in this case.