I am using the Tesseract Java API (tess4J) to convert TIFF files to readable PDFs.
When I have a single source TIFF file, the results have been quite pleasing:
TessResultRenderer renderer = TessAPI1.TessPDFRendererCreate("outpath/my_new_pdf.pdf", dataPath, 0);
TessAPI1.TessResultRendererInsert(renderer, TessAPI1.TessPDFRendererCreate("output/my_new_pdf.pdf", dataPath, 0));
int result = TessAPI1.TessBaseAPIProcessPages(handle, sourceTiffFile.getAbsolutePath(), null, 0, renderer);
However, the API documentation states that you should be able to supply a list of files, as well as just a single file: Recognizes all the pages in the named file, as a multi-page tiff or list of filenames, or single image...
This would be very handy as I would like to pass in several TIFFS to produce a multi-page PDF, one page per image, but I haven't yet been able to work out how to pass in a list of images. The obvious first attempt was to pass in a comma separated list of absolute file paths to the TIFFs, where the above example passes in sourceTiffFile.getAbsolutePath()
, but the result is a very small, apparently corrupt PDF file.
Any suggestions would be most welcome.
Try a filelist with each entry on a separate line (i.e, delimited by \n character).