I have been trying to convert html to pdf. I have tried a lot of tools but none of them work. Now I am using playwright, it is converting the Page to PDF but it only gets the first screen view. From that page the content from right is trimmed.
import os
import time
import pathlib
from playwright.sync_api import sync_playwright
filePath = os.path.abspath("Lab6.html")
fileUrl = pathlib.Path(filePath).as_uri()
fileUrl = "file://C:/Users/PMYLS/Desktop/Code/ScribdPDF/Lab6.html"
with sync_playwright() as p:
browser = p.chromium.launch()
page = browser.new_page()
page.goto(fileUrl)
for i in range(5): #(The scroll is not working)
page.mouse.wheel(0, 15000)
time.sleep(2)
page.wait_for_load_state('networkidle')
page.emulate_media(media="screen")
page.pdf(path="sales_report.pdf")
browser.close()
Html View
PDF file after running script I have tried almost every tool available on the internet. I also used selenium but same results. I thought it was due to page not loaded properly, I added wait and manually scrolled the whole page to load the content. All giving same results.
The html I am converting https://drive.google.com/file/d/16jEq52iXtAMCg2FDt3VbQN0dCQmdTip_/view?usp=sharing
Here's a somewhat dirty solution that worked on my end. The sleep
and scroll isn't great and can probably be improved, but I'll leave this as a starter and see if I have time to tighten it up later (feel free to do the same).
from playwright.sync_api import sync_playwright # 1.37.0
from time import sleep
with open("index.html") as f:
html = f.read()
with sync_playwright() as p:
browser = p.chromium.launch()
page = browser.new_page()
page.set_content(html)
# focus inside the annoying border to enable scroll
page.click(".document_container")
for i in range(10):
page.mouse.wheel(0, 2500)
sleep(0.5)
# strip out the annoying border that messes up PDF generation
page.evaluate("""() => {
const el = document.querySelector(".document_scroller");
el.parentElement.appendChild(el.querySelector(".document_container"));
el.remove();
}""")
page.emulate_media(media="screen")
page.pdf(path="sales_report.pdf")
browser.close()
Two tricks: