Search code examples
htmlpdfimagemap

Creating image (PNG or JPEG) from PDF along with HTML image maps of text in the image?


I am documenting a system I maintain. This documentation contains a diagram I created in TeX/TikZ which gets rendered to a PDF file. Then I convert the PDF file to an image file (PNG via imagemagick), and include it in my HTML documentation. Works great.

Now I would like to create an image map for the image, so that I can add hyperlinks/mouseovers/etc. This is an image that I expect to update periodically based on changes in my system, so I would like to automate this process if possible.

Is there a way to use a software library or tool to automatically create image maps of the various text content in the PDF file, when it gets rendered to PNG?

Here is an example from this gist I created:

enter image description here

In this case I would like to turn some of the various text strings into hyperlinks by locating their bounding box in the PDF:

  • controller
  • actuator
  • sensor
  • A
  • B
  • C
  • D
  • u
  • y
  • F(s)
  • G(s)
  • H(s)

(They are all text content in the PDF file; I can select the text of any of them in Acrobat Reader and copy + paste into my text editor.)

Is there a way to do this?


Solution

  • I was able to put together the following Python solution that could serve as a starting point. It converts the pdf to a png and outputs corresponding image map markup.

    It takes output dpi as an optional argument (default 200) in order to properly scale the bounding boxes onto the png from the default pdf dpi of 72:

    from pdf2image import convert_from_path
    from pdfminer.converter import PDFPageAggregator
    from pdfminer.layout import LAParams, LTTextBox
    from pdfminer.pdfinterp import PDFPageInterpreter
    from pdfminer.pdfinterp import PDFResourceManager
    from pdfminer.pdfpage import PDFPage
    
    from yattag import Doc, indent
    
    import argparse
    import os
    
    
    def transform_coords(lobj, mb):
    
        # Transform LTTextBox bounding box to image map area bounding box.
        #
        # The bounding box of each LTTextBox is specified as:
        #
        # x0: the distance from the left of the page to the left edge of the box
        # y0: the distance from the bottom of the page to the lower edge of the box
        # x1: the distance from the left of the page to the right edge of the box
        # y1: the distance from the bottom of the page to the upper edge of the box
        #
        # So the y coordinates start from the bottom of the image. But with image map
        # areas, y coordinates start from the top of the image, so here we subtract
        # the bounding box's y-axis values from the total height.
    
        return [lobj.x0, mb[3] - lobj.y1, lobj.x1, mb[3] - lobj.y0]
    
    
    def get_imagemap(d):
        doc, tag, text = Doc().tagtext()
        with tag("map", name="map"):
            for k, v in d.items():
                doc.stag("area", shape="rect", coords=",".join(v), href="", alt=k)
        return indent(doc.getvalue())
    
    
    def get_bboxes(pdf, dpi):
        fp = open(pdf, "rb")
        rsrcmgr = PDFResourceManager()
        device = PDFPageAggregator(rsrcmgr, laparams=LAParams())
        interpreter = PDFPageInterpreter(rsrcmgr, device)
        page = list(PDFPage.get_pages(fp))[0]
    
        interpreter.process_page(page)
        layout = device.get_result()
    
        # PDFminer reports bounding boxes based on a dpi of 72. I could not find a way
        # to change this, so instead I scale each coordinate by multiplying by dpi/72
        scale = dpi / 72.0
    
        return {
            lobj.get_text().strip(): [
                str(int(x * scale)) for x in transform_coords(lobj, page.mediabox)
            ]
            for lobj in layout
            if isinstance(lobj, LTTextBox)
        }
    
    
    def main():
        parser = argparse.ArgumentParser()
        parser.add_argument("pdf")
        parser.add_argument("--dpi", type=int, default=200)
    
        args = parser.parse_args()
    
        page = list(convert_from_path(args.pdf, args.dpi))[0]
        page.save(f"{os.path.splitext(args.pdf)[0]}.png", "PNG")
    
        print(get_imagemap(get_bboxes(args.pdf, args.dpi)))
    
    
    if __name__ == "__main__":
        main()
    

    Example result:

    <img src="https://i.sstatic.net/aXWMc.png" usemap="#map">
    <map name="map">
      <area shape="rect" coords="361,8,380,43" href="#" alt="B" />
      <area shape="rect" coords="434,31,500,64" href="#" alt="G(s)" />
      <area shape="rect" coords="432,93,502,117" href="#" alt="actuator" />
      <area shape="rect" coords="552,8,572,42" href="#" alt="C" />
      <area shape="rect" coords="596,58,609,86" href="#" alt="y" />
      <area shape="rect" coords="105,26,119,40" href="#" alt="+" />
      <area shape="rect" coords="107,54,122,78" href="#" alt="−" />
      <area shape="rect" coords="35,58,51,86" href="#" alt="u" />
      <area shape="rect" coords="164,8,182,43" href="#" alt="A" />
      <area shape="rect" coords="163,152,183,187" href="#" alt="D" />
      <area shape="rect" coords="241,31,311,64" href="#" alt="H(s)" />
      <area shape="rect" coords="236,94,316,118" href="#" alt="controller" />
      <area shape="rect" coords="243,175,309,208" href="#" alt="F (s)" />
      <area shape="rect" coords="247,234,305,258" href="#" alt="sensor" />
    </map>