Search code examples
pythonwkhtmltopdfpdfkit

pdfkit changes href from relative to absolute paths on conversion


I'm using pdfkit to convert html files that have links with href attributes in them.

Inside of the html, href's are written with relative paths, e.g.:

<a href="folder/picture.jpg">PIC</a>

When I convert this to pdf, the hrefs seem to be automatically rewritten to absolute paths (C:/Users/...).

Why does pdf change the href?


Solution

  • Wkhtmltopdf, which pdfkit relies on, converts relative links to absolute links by default.

    This can be stopped by using the command line tool with a special flag:

    wkhtmltopdf --keep-relative-links src destination
    

    Or by telling pdfkit to apply this option:

    def convert_to_pdf(path):
        try:
            # run the conversion and write the result to a file
            config = pdfkit.configuration(wkhtmltopdf=path_wkthmltopdf)
            options = {
                '--keep-relative-links': ''
            }
            pdfkit.from_url(path+'.htm', path+'.pdf', configuration=config, options=options)
        except Exception as why:
            # report the error
            sys.stderr.write('Pdf Conversion Error: {}\n'.format(why))
            raise