Search code examples
javascriptnode.jseventsweb-scrapingemit

How to make a node.js webscraper periodically check an endpoint for data updates?


I am writing a discord bot which aggregates data from an third-party API.

There is a design pattern from discord.js which I want to follow for my web-scraping functions, wherein one instantiates a client object, and performs actions when the client emits specific events, like so:

const Discord = require('discord.js');
const client = new Discord.Client();

client.on('ready', () => {
  console.log(`Logged in as ${client.user.tag}!`);
});

client.on('message', msg => {
  if (msg.content === 'ping') {
    msg.reply('Pong!');
  }
});

client.login('token');

To my understanding this code will run indefinitely, performing actions each time a specific event is emitted, e.g. ready or message.

I cannot find out how such functionality is implemented. More specifically, I can't figure out how the discord client object continually looks for changes, and emits an event when it notices them.

The reason I want to emulate this design pattern is so that I can run one node.js application which will, say every 10 minutes, reach out to the API and see if there is new information, and log it into a database when there are changes.

My initial thought is something along these lines, but it blows up the callstack with an out of memory error.

const events = require("events");

class ScrapeEmitter extends events.EventEmitter {}
const scrapeEmitter = new ScrapeEmitter();

scrapeEmitter.on("timeExpired", () => console.log("call scraping code here"));

while (true) {
  setTimeout(() => scrapeEmitter.emit("timeExpired"), 1500);
}

The end goal is to, from index.js, write the following, and have it both listen for discord events, while also scraping for data.

import * as scraper from "./core/scraper";
const Discord = require('discord.js');
const client = new Discord.Client();

client.on('ready', () => {
  console.log(`Logged in as ${client.user.tag}!`);
});

client.on('message', msg => {
  if (msg.content === 'ping') {
    msg.reply('Pong!');
  }
});

client.login('token');
scraper.begin_scraping();

Solution

  • This portion of code

    while (true) {
      setTimeout(() => scrapeEmitter.emit("timeExpired"), 1500);
    }
    

    creates an infinite amount of timeouts. What you need to do is only start a timeout after the previous one has completed. An example is:

    function loop() {
    setTimeout(loop, 1500);
    }
    

    This calls the function after 1500 seconds, which in turn calls the function after 1500 seconds, and so on.

    However, the better solution is to use setInterval(). It looks like this:

    function loop() {};
    setInterval(loop, 1500);
    

    So, instead of writing

    while (true) {
      setTimeout(() => scrapeEmitter.emit("timeExpired"), 1500);
    }
    

    Write

    setInterval(() => scrapeEmitter.emit("timeExpired"), 1500);
    

    This removes the infinite loop and acts as expected.

    I'm just translating @Worthy Alpaca's answer into a comment. It's a community wiki, so I get no reputation