Search code examples
javascriptweb-scrapingpuppeteer

Puppeteer "Target Close" crash at random points


I have this little puppeteer script that should scrape the specific website, get some info from it, navigate through pagination and then print the data. But for some reason, it keeps crashing at a random point each time with this error.

import puppeteer from "puppeteer";

(async () => {
  const browser = await puppeteer.launch({ headless: false });
  const page = await browser.newPage();
  page.setDefaultNavigationTimeout(10 * 60 * 1000);

  await page.goto("https://starti.ge/tickets.php?category=1");
  await page.setViewport({ width: 1080, height: 1024 });

  function delay(time) {
    return new Promise(function(resolve) { 
        setTimeout(resolve, time)
    });
 }

  const extractDataFromPage = async () => {
    // Select all question containers
    const containers = await page.$$("#examQuestions > .container");

    for (const container of containers) {
      try {
        const question = await container.$eval("#questionText", (el) =>
          el.textContent.trim()
        );
        const answers = await container.$$eval(
          ".answers-container .answer-box",
          (answerBoxes) => answerBoxes.map((box) => box.textContent.trim())
        );
        const correctAnswerIndex = await container.$eval(
          "#answersbutton button[data-correct-answer]",
          (button) => parseInt(button.getAttribute("data-correct-answer"), 10)
        );
        const correctAnswer = answers[correctAnswerIndex - 1];

        const imageSrc = await container
          .$eval(".image-container img", (img) => img.getAttribute("src"))
          .catch(() => null);

        const imageURL = imageSrc ? `https://starti.ge/tickets.php?category=1/${imageSrc}` : null;

        console.log({
          question,
          answers,
          correctAnswer,
          imageURL,
        });
      } catch (error) {
        console.error("Error processing container:", error);
      }
    }
    await delay(5000)
  };

  // Process each page
  const totalPages = 55;
  for (let currentPage = 1; currentPage <= totalPages; currentPage++) {
    console.log(`Processing page ${currentPage}...`);
    await extractDataFromPage();

    // Go to the next page if not the last page
    if (currentPage < totalPages) {
      await page.click("#next");
      await page.waitForSelector("#examQuestions > .container", {
        timeout: 5000,
      });
    }
  }

  await browser.close();
})();

Here's the error message

TargetCloseError: Protocol error (DOM.describeNode): Target closed

I think the problem is with the pagination handling. I tried adding delay and using different version of node with different starting args, but that wasn't it.


Solution

  • Your script ran to completion for me, although sleeping and page.$$ tend to be slow and flaky, so I'd avoid both.

    That said, it looks like all of the quiz data is in a single, unprotected JSON file, so I'd just grab that with a request:

    fetch("https://starti.ge/exam/ka.json")
      .then(res => res.json())
      .then(console.log);
    

    Pretty easy, right?

    You can determine this by looking in the network tab to see how the data is being sent to the page.