Search code examples
javascriptperlweb-scrapingweb-crawlerwww-mechanize

Trouble crawling/scraping webpages that use javascript with Perl


I've been trying to teach myself how to crawl and scrape different websites. I got a good feeling about crawling/scraping, but only with websites which mainly use HTML. Now I'm working with this link https://intel.taleo.net/careersection/10000/jobsearch.ftl

I'm using Perl (with mechanize) to do the following task : I want to write a crawler/scraper to click the "United States" checkbox on the left (filtering the results) and then collect the titles of all jobs. However, I couldn't find a way to navigate to this radio button using Perl. Can someone get me started on this? (an example code would be helpful).


Solution

  • you need to analyise the page and see how this radio button impelented in order to use WWW-Mechanize to eumulate the Javascript code if there JavaScript code there .

    also on Perl you have more easy options to handle JavaScript below some of crawling modules that handle javascript out of the box :

    1.WWW-Mechanize-Firefox which automate FireFox 
    2.WWW-Mechanize-PhantomJS which based on PhatonJS Broweser and can handle javascript
    3.WWW::Selenium which use Selenium 
    4.WWW::HtmlUnit  which based on Java HtmlUnit and can handle javascript