Search code examples
pythonpdf-generationwkhtmltopdf

How to Convert Local HTML Files to PDF Including CSS and Javascript


I currently have a project where HTML code is dynamically generated from spreadsheets, this code is then converted to PDFs.

I need to keep the CSS and Javascript formatting (such as Bootstrap) when I convert the file and also maintain hyperlinks.


I have tried:

  • Wkhtmltopdf through pdfkit in Python which does maintain hyperlinks but fails to keep any of my CSS/JS formatting. With my HTML file, I have tried using external, internal, and in-line CSS as some forums have suggested to no avail.
    This is what the pdfkit code looks like:
import pdfkit
import os

cwd = os.getcwd()

path_wkhtmltopdf = r"C:\Program Files\wkhtmltopdf\bin\wkhtmltopdf.exe"
config = pdfkit.configuration(wkhtmltopdf=path_wkhtmltopdf)

file_name = "index"
file_html = file_name + '.html'
file_pdf = file_name + '.pdf'

source_HTML = os.path.join(cwd, file_html)

pdfkit.from_file(source_HTML, file_pdf, configuration=config, options={"enable-local-file-access": ""})

  • Simply using Microsoft Convert to PDF which does maintain formatting, but fails to include hyperlinks and more importantly isn't automated (necessary for 200+ PDFs at a time).

I have mostly written this project in Python to write and convert the PDFs, if there is a way to achieve my goals of automating, maintaining the formatting, and keeping hyperlinks using packages/libraries from other languages, I am more than willing to try.

I have heard that I could use LaTeX (from the ground up) to accomplish this goal but I'd rather avoid switching from the HTML/CSS/JS framework.


Solution

  • So I decided to use iText7 in C# to generate the PDF. Although many QoL features of CSS are missing—see note below—it DOES support formatting and gives me more control. A shame to be forced to handwrite a lot of code already outlined with CSS but at least I can get exactly what I want... just with more effort.

    This is the code I used to get iText7 for those who might come across this:

    using System;
    using System.IO;
    using iText.Html2pdf;
    using iText.Kernel.Pdf;
    
    string outputPdfFilePath = "path/to/output.pdf";
    string htmlFilePath = "path/to/input.html";
    
    PdfWriter writer = new PdfWriter(outputPdfFilePath);
    PdfDocument pdfDocument = new PdfDocument(writer);
    
    HtmlConverter.ConvertToPdf(new FileStream(htmlFilePath, FileMode.Open), pdfDocument);
    
    pdfDocument.Close();
    

    *To save future headaches iText7...

    • Does not support CSS calc function e.g. calc(1 * 0.2125in)
    • Does not support CSS variables e.g. var(--some-var)
    • Does not support Gridboxes
    • Requires you to change the default margins like the following:
    @page {
        margin: 0.4250in;
    }
    

    Hope this helps others.