Search code examples
javapdfbox

PDFBox 2.0 Read single page and write/save to a new File


based on this SO question i tried reading through every single page in a pdf file. The background to this is, that i am trying to replace pages that do not contain any textcontent but do contain images with completly blank pages. The background for this is that the pdf can contain blank pages who might contain images. These pages do need to be there because they are about to beeing printed with duplex.

But with PDFBox 2.0 this seems to be a bit more complicated since i am running into a stacktrace everytime i am trying to save the freshly generated PDDocument. Should this be done any different with the new Version of PDFBox 2.0? Should i avoid closing the PDDocument buffer, because by leaving it out the sample programm does run without exception and what could be potentional side effects of this?

a simple running example can be seen here. You can use any pdf file, since the result will be a pdf file with the same amount of pages whom should be empty:

public static void main(String[] args) throws IOException {
    // Load a simple pdf file
    PDDocument d = PDDocument.load(new File("D:\\test.pdf"));
    // This should be our new output pdf
    PDDocument c = new PDDocument();
    for(int i = 0;i<d.getNumberOfPages();++i) {
        // From the SO question, create a new PDDocument and just add the single page
        PDDocument buffer = new PDDocument();
        PDPage page = d.getPage(i);
        buffer.addPage(page);

        // Here i´d check if it has content but gonna leave it out now

        // Reassign the page variable to generate a "blank" pdf
        page = new PDPage(); 

        // In order to let some printers not ignore the blank page I have to 
        // write white text on the white background.
        PDPageContentStream contentStream = new PDPageContentStream(buffer, page);

        PDFont font = PDType1Font.HELVETICA_BOLD;
        contentStream.beginText();
        contentStream.setNonStrokingColor(Color.white); // !!!!!!
        contentStream.setFont( font, 6 );
        contentStream.newLineAtOffset(100, 700);
        contentStream.showText("Empty page");
        contentStream.endText();
        contentStream.close();
        // Close the buffer document, if i comment it out the exception is gone
        buffer.close();
        // Add the blank page
        c.addPage(page);
    }
    d.close();
    // The exception occurs here and seems to be connected with the closing of the buffer document
    c.save("D:\\newtest.pdf");
    c.close();
}

The Stacktrace:

Exception in thread "main" java.io.IOException: Scratch file already closed
at org.apache.pdfbox.io.ScratchFile.checkClosed(ScratchFile.java:390)
at org.apache.pdfbox.io.ScratchFileBuffer.checkClosed(ScratchFileBuffer.java:99)
at org.apache.pdfbox.io.ScratchFileBuffer.seek(ScratchFileBuffer.java:295)
at org.apache.pdfbox.io.RandomAccessInputStream.restorePosition(RandomAccessInputStream.java:47)
at org.apache.pdfbox.io.RandomAccessInputStream.read(RandomAccessInputStream.java:78)
at java.io.InputStream.read(InputStream.java:101)
at org.apache.pdfbox.io.IOUtils.copy(IOUtils.java:66)
at org.apache.pdfbox.pdfwriter.COSWriter.visitFromStream(COSWriter.java:1134)
at org.apache.pdfbox.cos.COSStream.accept(COSStream.java:372)
at org.apache.pdfbox.pdfwriter.COSWriter.doWriteObject(COSWriter.java:533)
at org.apache.pdfbox.pdfwriter.COSWriter.doWriteBody(COSWriter.java:450)
at org.apache.pdfbox.pdfwriter.COSWriter.visitFromDocument(COSWriter.java:1034)
at org.apache.pdfbox.cos.COSDocument.accept(COSDocument.java:409)
at org.apache.pdfbox.pdfwriter.COSWriter.write(COSWriter.java:1284)
at org.apache.pdfbox.pdfwriter.COSWriter.write(COSWriter.java:1185)
at org.apache.pdfbox.pdmodel.PDDocument.save(PDDocument.java:1110)
at org.apache.pdfbox.pdmodel.PDDocument.save(PDDocument.java:1082)
at org.apache.pdfbox.pdmodel.PDDocument.save(PDDocument.java:1070)
at pdftools.Test.main(Test.java:41)

Solution

  • Your code is somewhat confusing, but the core of the problem is that in 2.0 you should not close documents if you are using their pages in another document.

    So here are some solutions:

    • don't close the buffer document, instead keep these documents until done
    • create the page and its content twice
    • create the new page for the destination only (why do you create it for "buffer", which you are dumping anyway?)
    • instead of duplicating pages with addPage(), use importPage(). This will make a deep copy.