Search code examples
perlxpathwww-mechanize

perl dom mechanize xpath


I'm trying to scrape some data from metacriti* website using mechanize, but I'm getting no output

Here's my code with a url example:

my $metaURL = "http://www.metacriti*.com/game/pc/dota-2";

my $mech = WWW::Mechanize->new();
$mech->get($metaURL) or die "unable to get $metaURL";

my $tree = HTML::TreeBuilder::XPath->new;
$tree->parse($mech->content);

my @nodes = $tree->findnodes(q{//*[@id="main"]//a[contains(./@href, "user-reviews")]/span[@class="score_value"]});

print $_->string_value, "\n" foreach(@nodes); # text

@nodes array seems to be empty, my xpath seems good and since i'm using the same syntax in another working script, I really couldnt figure out what is wrong with this one...

Also since this is just the begining, maybe you can suggest me another easy way to scrape/parse websites... If there's any better one :)

Thank you in advance


Solution

  • The HTML seems to be really bad, if you search for $tree->findnodes( '//div[@id="main"]')->[0]->as_HTML you get a very bare div:

    <div class="col main_col" id="main"><div itemscope="itemscope" itemtype="http://schema.org/SoftwareApplication"></div></div>
    

    this indeed does not contain any a, which explains the result you got.

    I tried using tidy to pretty print the HTML, but it barfed on the file.

    If you forget about the div and use q{//a[contains(./@href, "user-reviews")]/span[@class="score_value"]} you will get a result though, 7.9 in this case.