Search code examples
node.jsnode-request

Rate Limit the Nodejs Module Request


So I'm trying to create a data scraper with Nodejs using the Request module. I'd like to limit the concurrency to 1 domains on a 20ms cycle to go through 50,000 urls.

When I execute the code, I'm DoS-ing the network with the 40Gbps bandwidth my system has access to... This creates local problems and remote problems.

The 5 concurrent scans on a 120ms cycle for 50k domains (if I calculated correctly) will finish the list in ~20 minutes and will not create any issues remotely at least.

The code I'm testing with:

var urls = // data from mongodb

urls.forEach(fn(url) {
  // pseudo
  request the url
    process
});

The forEach function executes instantly "queueing" all urls and tries to fetch all. It seems impossible to do a delay on each loop. All google searches seem to show how to rate limit incoming request to your server/api. Same thing appears to happen with a for loop as well. Impossible to control how fast the loops execute. I'm missing something probably or the code logic is wrong. Any suggestions?


Solution

    1. For simplification your code implementation use async/await and Promises instead callbacks.
    2. Use package got or axios for run Promised requests.
    3. Use p-map or similar approach form promise-fun

    There is copypasted example:

    const pMap = require('p-map');
    
    const urls = [
    'sindresorhus.com',
    'ava.li',
    'github.com',
    …
    ];
    
    console.log(urls.length);
    //=> 100
    
    const mapper = url => {
    return fetchStats(url); //=> Promise
    };
    
    pMap(urls, mapper, {concurrency: 5}).then(result => {
    console.log(result);
    //=> [{url: 'sindresorhus.com', stats: {…}}, …]
    });