Search code examples
pythondownloadplaywrightheadless-browserplaywright-python

Playwright: Download via Print to PDF?


I'm seeking to scrape a web page using Playwright.

I load the page, and click the download button with Playwright successfully. This brings up a print dialog box with a printer selected.

print dialog box

I would like to select "Save as PDF" and then click the "Save" button.

Here's my current code:

with sync_playwright() as p:
    browser = p.chromium.launch(headless=True)
    playwright_page = browser.new_page()
    got_error = False

    try:
        playwright_page.goto(url_to_start_from)
        print(playwright_page.title())
        html = playwright_page.content()
    except Exception as e:
        print(f"Playwright exception: {e}")
        got_error = True

    if not got_error:
        soup = BeautifulSoup(html, 'html.parser')

        #download pdf
        with playwright_page.expect_download() as download_info:
            playwright_page.locator("text=download").click()

        download = download_info.value
        path = download.path()
        download.save_as(DOWNLOADED_PDF_FOLDER)

    browser.close()

Is there a way to do this using Playwright?


Solution

  • Thanks very much to @KJ in the comments, who suggested that with headless=True, Chromium won't even put up a print dialog box in the first place.