Search code examples
node.jsweb-scrapingweb-crawlerpuppeteerchromium

Puppeteer always loads mobile script on remote server


I'm trying to scrape (headless) this URL's scripts but I notice that whenever I'm doing it on my local machine I'm getting: "https://vidstat.taboola.com/lite-unit/4.1.0/UnitFeedManagerDesktop.min.js" script.

The issue is when I'm calling the API for scraping on a remote server (postman) I'm always getting a script that should appear only on mobile devices only: https://vidstat.taboola.com/lite-unit/4.1.0/UnitFeedManagerMobile.min.js"

this is my code:

public async fetchScripts(url: string, waitFor = 'cdn-pipes.js') {
    
    const page = await this.browser.newPage();

    try {
      await page.goto(url, {timeout: 10000, waitUntil: 'domcontentloaded'});
      const func = waitFor ? `document.documentElement.innerHTML.indexOf("${waitFor}") !== -1 || document.documentElement.innerHTML.indexOf("spa-detector") !== -1` :
        'document.readyState === "complete"';
      await page.waitForFunction(func, {polling: 500, timeout: 8000}).catch(reason => {
        console.error('page.waitForFunction', {error: reason, url});
      });

      const pageUrls = await page.evaluate(() => {
        const urlArray = Array.from(document.scripts).map((link) => link.src).filter(value => value.includes('taboola.com'));

        return [...new Set(urlArray)];
      });

      console.log('fetchMinimal - urlsArray ', {pageUrls});

      return pageUrls;
    } catch (e) {
      console.error('fetchMinimal - error ', e);
    } finally {
      await page.close();
    }

  }

I'm suspecting this is a CDN issue that saving old scripts somehow IDK, any thoughts?

UPDATE:

It's happening because the page loads the mobile script only if window.matchMedia(" only screen and (min-device-width : 320px) and (max-device-width : 480px)").matches which is always true on chromium-browser.


Solution

  • Thanks to this answer - I managed to solve it by providing args: ['--window-size=1920,1080'] to puppeteer.launch