Search code examples
jqueryfilegoogle-chrome-devtoolspuppeteerapify

How to download files through Apify


I am using apify/puppeteer-scraper to click the download button and export the data in this page to CSV:

I've managed to simulate the mouseclick of the download button (in the bottom right) and I got all the way until the CSV download. Now I want to download the resulting CSV file to a location so that it can be consumed.

Is there anyway to do this? I tried to import the require('fs'), but it dosen't seem to work through the Apify scraper.

This is my code so far inside the Pagefunction(context):

const [response] = await Promise.all([
  await page.click('#download-ToolbarButton > span.tabToolbarButtonImg.tab-icon-download'),
  await page.waitForSelector('#DownloadDialog-Dialog-Body-Id > div > button:nth-child(4)'),
  await page.focus('#DownloadDialog-Dialog-Body-Id > div > button:nth-child(4)'),
  await page.click('#DownloadDialog-Dialog-Body-Id > div > button:nth-child(4)'),
  await page.waitFor(5000),
  await page.waitForSelector('#export-crosstab-options-dialog-Dialog-BodyWrapper-Dialog-Body-Id > div > div.foyjxgp > div:nth-child(2) > div > label:nth-child(2)'),
  await page.focus('#export-crosstab-options-dialog-Dialog-BodyWrapper-Dialog-Body-Id > div > div.foyjxgp > div:nth-child(2) > div > label:nth-child(2)'),
  await page.waitForSelector('#export-crosstab-options-dialog-Dialog-BodyWrapper-Dialog-Body-Id > div > div.fdr6v0d > button'),

  await page.click('#export-crosstab-options-dialog-Dialog-BodyWrapper-Dialog-Body-Id > div > div.foyjxgp > div:nth-child(2) > div > label:nth-child(2)'),
  await page.waitFor(60000),
  await page._client.send('Page.setDownloadBehavior', {behavior: 'allow', downloadPath: './downloads'})
  // await  page.hover('#export-crosstab-options-dialog-Dialog-BodyWrapper-Dialog-Body-Id > div > div.fdr6v0d > button'),
  // await page.click('#export-crosstab-options-dialog-Dialog-BodyWrapper-Dialog-Body-Id > div > div.fdr6v0d > button'),
  // await page.waitFor(5000)
  // page.click('#export-crosstab-options-dialog-Dialog-BodyWrapper-Dialog-Body-Id > div > div.foyjxgp > div:nth-child(2) > div > label:nth-child(2)')
]);


Solution

  • Puppeteer scraper is not intended to be extended by other libraries, even native ones and while you could probably use the solution linked below as a workaround to that, it would be recommended that you use a custom actor instead of ready-made scrapers which would allow you to require additional libraries. You can post a project request on the marketplace if you can't make such actor yourself. https://help.apify.com/en/articles/3211799-how-to-add-external-libraries-to-web-scraper https://apify.com/apify/puppeteer-scraper#context

    https://sdk.apify.com/docs/guides/getting-started https://apify.com/marketplace