So the basic server JS scraper in Meteor.
The pattern is kinda simple. Script finds certain links, then loads content from them and stores the content in variable.
Script keeps crashing when loading cheerio inside loop. Where's the catch ? What's the best implementation for this purpose ?
Meteor.methods({
loadPage: function () {
result = Meteor.http.get("http://url.com");
$ = cheerio.load(result.content);
$('.class').each(function(i,elem){
var link = $(this).attr('href');
var title = $(this).text();
var $ = cheerio.load(Meteor.http.get(link).content);
var postContent = $('.classOnLoadedPage');
Images.insert(
{
link: link,
title: title,
postContent: postContent
});
});
}
});
I got exactly the same problem today. Turns out it is problem with cheerio itself. Rather old version of it has this bug. You have to use newer version and then it works.
the most downloaded cheerio package in atmospherejs mrt:cheerio
wraps cheerio 0.12.3
, while current version in npm is cheerio 0.19.0
add rclai89:cheerio
instead of mrt:cheerio
and it will deliver cheerio 0.18.0
, and with this version load within loop works perfectly.