Search code examples
javascriptnode.jsweb-scrapingscrapycheerio

Scrapy like tool for Nodejs?


I would like to know if there is something like Scrapy for nodejs ?. if not what do you think of using the simple page download and parsing it using cheerio ? is there a better way.


Solution

  • I haven't seen such a strong solution for crawling / indexing whole websites like Scrapy in python, so personally I use Python Scrapy for crawling websites.

    But for scraping data from pages there is casperjs in nodejs. It is a very cool solution. It also works for ajax websites, e.g. angular-js pages. Python Scrapy cannot parse ajax pages. So for scraping data for one or few pages I prefer to use CasperJs.

    Cheerio is really faster than casperjs, but it doesn't work with ajax pages and it doesn't have such good structure of a code like casperjs. So I prefer casperjs even when you can use cheerio package.

    Coffee-script example:

    casper.start 'https://reports.something.com/login', ->
      this.fill 'form',
        username: params.username
        password: params.password
      , true
    
    casper.thenOpen queryUrl, {method:'POST', data:queryData}, ->
      this.click 'input'
    
    casper.then ->
      get = (number) =>
        value = this.fetchText("tr[bgcolor= '#AFC5E4'] >  td:nth-of-type(#{number})").trim()