I'm trying to generate a PDF version of a third-party HTML (actually it is an HTM file). This HTML may change in future and I have absolutely no control over it. All I wanna do is convert it to a PDF.
I already tried 2 solutions: iText (with XmlWorker) and Flying-Saucer, but no success so far.
My problem is that the HTML file is very out of default patterns. Examples:
<link rel=File-List href="040602_inds_files/filelist.xml">
<meta http-equiv=Content-Type content="text/html; charset=windows-1252">
The first one has no close tag (iText crashes) and the second one has no double quotes on 'http-equiv' value (Flying-Saucer crashes).
I have found a lot of posts about this issue, but all of them are handling their own HTML, so they can fix it and try again. But i can't do this.
This is the page I'm trying to convert.
Here is my iText convert method:
public static void convert(PdfWriter writer, Document document, String siteUrl) throws MalformedURLException, IOException {
XMLWorkerHelper.getInstance().parseXHtml(writer, document,
new BufferedReader(new InputStreamReader(new URL(siteUrl).openStream())));
}
And here is my Flying-Saucer convert method:
public static void convertFS(String siteUrl, String fileName) throws com.lowagie.text.DocumentException, IOException {
OutputStream os = new FileOutputStream(fileName);
ITextRenderer renderer = new ITextRenderer();
renderer.setDocument(siteUrl);
renderer.layout();
renderer.createPDF(os);
os.close();
}
Any tips? I accept other libs if they are decently usable. Thx in advance.
You can first parse HTML file by jsoup and then convert content to a standard HTML file, finally you can use iText to generate PDF