Search code examples
javapdftifftess4jleptonica

How to pass a list of TIFF files to TessAPI1.TessBaseAPIProcessPages


I am using the Tesseract Java API (tess4J) to convert TIFF files to readable PDFs.

When I have a single source TIFF file, the results have been quite pleasing:

        TessResultRenderer renderer = TessAPI1.TessPDFRendererCreate("outpath/my_new_pdf.pdf", dataPath, 0);
        TessAPI1.TessResultRendererInsert(renderer, TessAPI1.TessPDFRendererCreate("output/my_new_pdf.pdf", dataPath, 0));
        int result = TessAPI1.TessBaseAPIProcessPages(handle, sourceTiffFile.getAbsolutePath(), null, 0, renderer);

However, the API documentation states that you should be able to supply a list of files, as well as just a single file: Recognizes all the pages in the named file, as a multi-page tiff or list of filenames, or single image...

This would be very handy as I would like to pass in several TIFFS to produce a multi-page PDF, one page per image, but I haven't yet been able to work out how to pass in a list of images. The obvious first attempt was to pass in a comma separated list of absolute file paths to the TIFFs, where the above example passes in sourceTiffFile.getAbsolutePath(), but the result is a very small, apparently corrupt PDF file.

Any suggestions would be most welcome.


Solution

  • Try a filelist with each entry on a separate line (i.e, delimited by \n character).