Search code examples
javaxhtmlpdf-generationfreemarkerflying-saucer

Inline Image vs Temporary Files (Java XHTML->PDF generation)


I have a project where I need to generate a PDF file. Within this PDF I have to insert a body of text as well as four or five large images (roughly 800px*1000px). In order to make this flexible I have opted to use FreeMarker in conjunction with XHTMLRenderer (flying-saucer).

I am now faced with a couple of options:

  1. Create the images and save them as temporary files to disk. Then process an .xhtml template with FreeMarker (saving it to disk) and pass the processed .xhtml file URL to XHTMLRenderer to generate the PDF. All these created files (bar the PDF) would be created with File.createTempFile. This would allow FreeMarker to pick the images up off the disk (as if they were images linked in the XHTML)
  2. Process the .xhtml template and keep it in memory. Pass the images to the template as base64 encoded data urls. This would remove the need for saving any temporary files as the output from FreeMarker could be passed directly to XHTMLRenderer.

Base64 Encoded Image Url example (a small folder icon):

<img src="data:image/gif;base64,R0lGODlhEAAOALMAAOazToeHh0tLS/7LZv/0jvb29t/f3//Ub/
/ge8WSLf/rhf/3kdbW1mxsbP//mf///yH5BAAAAAAALAAAAAAQAA4AAARe8L1Ekyky67QZ1hLnjM5UUde0ECwLJoExK
cppV0aCcGCmTIHEIUEqjgaORCMxIC6e0CcguWw6aFjsVMkkIr7g77ZKPJjPZqIyd7sJAgVGoEGv2xsBxqNgYPj/gAwXEQA7" />

My main question is which would be a better technique? Is creating lots of temporary files bad (does it carry lots of overhead)? Could I potentially run out of memory creating such large base64 encoded strings?


Solution

  • I found myself asking the same question recently. After some benchmarking, it turns out the data URI approach was the best bet.

    Storing a bunch of Base64-encoded images can be expensive. But the overhead for creating temp files, streaming image data in, then waiting for XHTMLRenderer hit that temp file 4 times before cleaning it up is also taxing.

    In my experiments, the Base64 images proved to be a better approach. That being said, I'm not sure to what extent it will remain true for larger images. In my case, I was testing with 32x32 icons, 80x80 logos, 400x240 bar graphs and one 600x400 graphic. The difference in overhead was significant with everything except the 600x400 graphic, where it got really negligible.

    (A side note for Joop Eggen- In my case, PDF generation is time critical. The user clicks a button the PDF and expects the download to begin immediately.)