Search code examples
javaflying-saucerhtml-to-pdfhtml-to-jpeg

Best Java lib for programmatically converting a HTML page to an Image/PDF


I am looking for the best Java lib which I can pass in a URL and have it create an image of what the web page looks like as it would in a browser. I tried out flyingsaucer however it seems like almost every web page breaks it -- it wont even render www.google.com or yahoo.com -- the only site i could get it to render is www.w3c.org!

Thoughts on a better tool to use, or possibly allow flying saucer to be more lax in the xhtml is accepts?


Solution

  • Flying Saucer fails on many pages since it only allows xhtml (see manual).

    But you can use some html libs to "clean" your input an then use FS.

    Webesite -> "Cleaner" -> Flying Saucer

    Some good and free libs are:

    1. JSoup (personal recommendation)
    2. HtmlCleaner
    3. JTidy (sometimes more strict than needed)
    4. Jericho HTML