In several scripts, I use wkhtmltopdf in Powershell to headless print pdfs of webpage content. This works well except on websites busy with widgets and javascript complexity where the pdf print output is a jumbled mess.
One such webpage offers a button to print and uses javascript printSelectedDiv
. This opens the Windows print dialog and will print exactly the desired div from the complex page.
I am able using Powershell to automate clicking and submitting the printjob. However, I want in a headless manner to perform this in a scheduled task like several other scripts.
I'm able to automate the printing as follows with Sendkeys:
$ie = new-object -ComObject "InternetExplorer.Application"
$requestUri = ""
$ie.silent = $true
while($ie.Busy) { Start-Sleep -Milliseconds 100 }
$doc = $ie.Document
$pdfPrinter = Get-WmiObject -Class Win32_Printer | Where{$_.Name -eq "Microsoft Print to PDF"}
$pdfPrinter.SetDefaultPrinter() | Out-Null
$printButton = $doc.getElementsByTagName("a") | Where-Object {$ -eq "btnPrintList"}
Start-Sleep -Second 2
$wshell = New-Object -com WScript.Shell
Start-Sleep -Milliseconds 500
Start-Sleep -Milliseconds 500
Is there a better scripted control of this process instead of sending keystrokes? I don't know if sending keystrokes would reliably work headless in a scheduled task if at all.
If you'd like to keep using wkhtmltopdf, you can use this approach.
Your code completely gets the job done, all you need to do is call the $
method, then check back on $ie.Document
, the Body.InnerHTML
object will contain the full HTML of the page you requested, so you could send that over to wkhtmltopdf.
$ie.Document.body.innerHTML > c:\temp\Page.html
& 'C:\Program Files\wkhtmltopdf\bin\wkhtmltopdf.exe' c:\temp\page.html c:\temp\page.pdf
The only problem is resolving the image URLs, you'd have to replace the urls in the tags, change them from relative links to absolute links, subbing the \ for the full url of the page you're loading.