Search code examples
node.jsterminalraspberry-pipuppeteer

How to send a signal from a terminal window to a webscraper running on Puppeteer/Node.js on Raspberry Pi


I am running a simple web scraper written in Puppeteer / Node.js on a Raspberry Pi. It downloads data from a websites at 6pm and 6am every day. Every so often, say once a week, I'd like to send a command or signal or something telling it to download the data immediately. This presumably would be be a command from another terminal window. The question is how do I do this? I could write a small file and have the program constantly look out for the file, but this seems very crude. Is there a better way? Something simple please as I'm a bit of a novice!


Solution

  • Keeping in mind that I don't actually recommend doing this in most cases, as discussed in my previous answer, here's a contrived example to illustrate the readline approach.

    const puppeteer = require("puppeteer"); // ^22.2.0
    const readline = require("readline");
    const {setTimeout} = require("node:timers/promises");
    
    const rl = readline.createInterface({
      input: process.stdin,
      output: process.stdout,
    });
    
    const getPageTitle = async (
      url = "https://en.wikipedia.org/wiki/Special:Random"
    ) => {
      let browser;
    
      try {
        browser = await puppeteer.launch();
        const [page] = await browser.pages();
        await page.goto(url, {waitUntil: "domcontentloaded"});
        return await page.title();
      }
      finally {
        await browser?.close();
      }
    };
    
    rl.on("line", async line => {
      try {
        console.log(`  Received command from stdin: \`${line}\``);
        console.log("  Command result:", await getPageTitle(line));
      }
      catch (err) {
        console.error(err);
      }
    });
    
    (async () => {
      for (;;) {
        try {
          console.log(await getPageTitle());
        }
        catch (err) {
          console.error(err);
        }
        await setTimeout(20_000);
      }
    })();
    
    // execute with: `cat | node script.js` to enable the below behavior
    console.log(
      `to send command: echo 'https://www.stackoverflow.com' > /proc/${process.pid}/fd/0`
    );
    console.log("_".repeat(80));
    

    This prints a random Wikipedia page title every 20 seconds while also listening to stdin for additional URLs to print titles for. With the script running on a single terminal, type in https://stackoverflow.com to see it print Stack Overflow - Where Developers Learn, Share, & Build Careers, for example.

    Now, if you run it with cat | node script.js to provide a pipe as described here, you can send commands to it from another terminal with echo 'https://www.stackoverflow.com' > /proc/{pid}/fd/0 where {pid} is printed dynamically by the program at startup, or obtainable with pgrep node.