Search code examples
javascriptpowershellinternet-explorerpdfwkhtmltopdf

Script pdf printing of webpage that has printSelectedDiv javascript using Powershell


In several scripts, I use wkhtmltopdf in Powershell to headless print pdfs of webpage content. This works well except on websites busy with widgets and javascript complexity where the pdf print output is a jumbled mess.

One such webpage offers a button to print and uses javascript printSelectedDiv. This opens the Windows print dialog and will print exactly the desired div from the complex page.

I am able using Powershell to automate clicking and submitting the printjob. However, I want in a headless manner to perform this in a scheduled task like several other scripts.

I'm able to automate the printing as follows with Sendkeys:

$ie = new-object -ComObject "InternetExplorer.Application"
$requestUri = "https://www.complexpagefullofwidgets.com"
$ie.silent = $true
$ie.navigate($requestUri)
while($ie.Busy) { Start-Sleep -Milliseconds 100 }
$doc = $ie.Document

$pdfPrinter = Get-WmiObject -Class Win32_Printer | Where{$_.Name -eq "Microsoft Print to PDF"}
$pdfPrinter.SetDefaultPrinter() | Out-Null

$printButton = $doc.getElementsByTagName("a") | Where-Object {$_.id -eq "btnPrintList"}
$printButton.click()

Start-Sleep -Second 2

$wshell = New-Object -com WScript.Shell
$wshell.sendkeys("{ENTER}")
Start-Sleep -Milliseconds 500
$wshell.sendkeys("%n")
Start-Sleep -Milliseconds 500
$wshell.sendkeys("c:\temp\temp.pdf")
$wshell.sendkeys("{ENTER}")

Is there a better scripted control of this process instead of sending keystrokes? I don't know if sending keystrokes would reliably work headless in a scheduled task if at all.


Solution

  • If you'd like to keep using wkhtmltopdf, you can use this approach.

    Your code completely gets the job done, all you need to do is call the $printButton.click() method, then check back on $ie.Document, the Body.InnerHTML object will contain the full HTML of the page you requested, so you could send that over to wkhtmltopdf.

    $ie.Document.body.innerHTML > c:\temp\Page.html
    & 'C:\Program Files\wkhtmltopdf\bin\wkhtmltopdf.exe' c:\temp\page.html c:\temp\page.pdf
    

    The only problem is resolving the image URLs, you'd have to replace the urls in the tags, change them from relative links to absolute links, subbing the \ for the full url of the page you're loading.