Search code examples
javascriptnode.jsrequest

extract all hyperlinks ( from external website ) using node.js and request


Right now our app writes the source code of nodejs.org to the console. We'd like it to write all hyperlinks of nodejs.org instead. Maybe we need just one line of code to get the links from body.

app.js:

var http = require('http');

http.createServer(function (req, res) {
    res.writeHead(200, {'Content-Type': 'text/plain'});
    res.end('Hello World\n');
}).listen(1337, '127.0.0.1');
console.log('Server running at http://127.0.0.1:1337/');

var request = require("request");



request("http://nodejs.org/", function (error, response, body) {
    if (!error)
        console.log(body);
    else
        console.log(error);
});

Solution

  • You are probably looking for either [jsdom][1] , [jquery][2] or [cheerio][3]. What you are doing is called screen scraping, extracting data from a site. jsdom/jquery offer complete set of tools but cheerio is much faster.

    Here is a cheerio example :

    var request = require('request');
    var cheerio = require('cheerio');
    var searchTerm = 'screen+scraping';
    var url = 'https://www.bing.com/search?q=' + searchTerm;
    request(url, function(err, resp, body){
      $ = cheerio.load(body);
      links = $('a'); //jquery get all hyperlinks
      $(links).each(function(i, link){
        console.log($(link).text() + ':\n  ' + $(link).attr('href'));
      });
    });
    

    You choose whatever is best for you. [1]: https://npmjs.org/package/jsdom [2]: https://npmjs.org/package/jquery [3]: https://npmjs.org/package/cheerio