Search code examples
node.jsmeteorweb-scrapingcheerio

Calling cheerio.load inside each loop


So the basic server JS scraper in Meteor.

The pattern is kinda simple. Script finds certain links, then loads content from them and stores the content in variable.

Script keeps crashing when loading cheerio inside loop. Where's the catch ? What's the best implementation for this purpose ?

  Meteor.methods({
    loadPage: function () {
      result = Meteor.http.get("http://url.com");
      $ = cheerio.load(result.content);
      $('.class').each(function(i,elem){
        var link = $(this).attr('href');
        var title = $(this).text();
        var $ = cheerio.load(Meteor.http.get(link).content);
        var postContent = $('.classOnLoadedPage');
        Images.insert(
          {
            link: link,
            title: title,
            postContent:  postContent
          });
      });
    }
  });

Solution

  • I got exactly the same problem today. Turns out it is problem with cheerio itself. Rather old version of it has this bug. You have to use newer version and then it works.

    the most downloaded cheerio package in atmospherejs mrt:cheerio wraps cheerio 0.12.3, while current version in npm is cheerio 0.19.0

    add rclai89:cheerio instead of mrt:cheerio and it will deliver cheerio 0.18.0, and with this version load within loop works perfectly.