Search code examples
javapdfitextpdf-generationitext7

Cannot copy indirect object from the document that is being written Java


I've created a method like this:

  public PdfDocument addBlankPage(final MediaModel pdfDocument) throws IOException {

    final InputStream inputStream = mediaService.getStreamFromMedia(pdfDocument);
    byte[] bytes = IOUtils.toByteArray(inputStream);
    final PdfReader reader = new PdfReader(new ByteArrayInputStream(bytes));
    final PdfWriter writer = new PdfWriter(pdfDocument.getRealFileName());
    final PdfDocument document = new PdfDocument(reader, writer);
    int index = document.getNumberOfPages();
    final PageSize ps = new PageSize(document.getFirstPage().getPageSize());
    document.addNewPage(index + 1, ps);
    reader.close();
    writer.close();
    return document;

}

In order to add a new blank page to a PdfDocument and it looks fine and its "seems" to work. However , when I try to merge a PdfDocument with a blank page (added by my method) with other existing pdf documents in this method:

 public .... {

    ByteArrayOutputStream mergedPdfStream = new ByteArrayOutputStream();
    PdfDocument mergedPdf = new PdfDocument(new PdfWriter(mergedPdfStream));

    for (PdfDocument doc : pdfDocuments) {
        int n = doc.getNumberOfPages();

        for (int i = 1; i <= n; i++) {

            PdfPage page = doc.getPage(i).copyTo(mergedPdf);
            mergedPdf.addPage(page);

        }
    }
    ....

}

It throws :

 com.itextpdf.kernel.PdfException: Cannot copy indirect object from the document that is being written.
at com.itextpdf.kernel.pdf.PdfObject.copyTo(PdfObject.java:318) ~[kernel-7.1.1.jar:?]
at com.itextpdf.kernel.pdf.PdfDictionary.copyTo(PdfDictionary.java:443) ~[kernel-7.1.1.jar:?]
at com.itextpdf.kernel.pdf.PdfPage.copyTo(PdfPage.java:379) ~[kernel-7.1.1.jar:?]
at com.itextpdf.kernel.pdf.PdfPage.copyTo(PdfPage.java:364) ~[kernel-7.1.1.jar:?]

I googled it and I didn't found any relevant information . Any hint ?

PD: I'm 100% sure that my method is the guilty , because when I merge other PDFs without using the blank Page method , it always works..


Solution

  • What you have observed in this and in your previous question is due to a peculiarity of the iText PdfDocument class: While it does represent a PDF document, it does not hold all of it in memory or in some accessible storage. In particular if you add content to it, this new content is by default flushed out of memory to the PdfWriter as soon as possible, making it inaccessible to the PdfDocument.

    This enables you to keep the memory footprint fairly low while creating large PDFs with iText, a factor which can be very relevant in high-throughput applications.

    The downside is that that there are restrictions to the use of PdfDocument instances; in particular you cannot freely copy from instances that have been written to as the current state of the data to copy might not be retrievable anymore.

    To prevent you from copying inconsistent data, iText disallows copying from PdfDocument instances which can be written to, i.e. which have a PdfWriter.

    Thus,

    • if you want to copy from a document, the PdfDocument needs to be initialized without a PdfWriter;
    • if you want to (non-trivially) change a document, the PdfDocument needs to be initialized with a PdfWriter;
    • so if you want to change and copy from a document, you cannot use the same PdfDocument instance for both actions!

    For your use case, therefore, you have to

    • either take the output of the PdfDocument with PdfWriter after applying the changes and use it as input of a PdfDocument without PdfWriter to copy from;
    • or open two separate PdfDocument instances from the source file, one with and one without a PdfWriter, and apply the changes to the first and copy from the second.

    The former option is necessary if the data you want to copy shall contain the changes you apply. The latter is necessary if they shall not contain them. If you don't care either way or if you know the copied data is not influenced by the changes, either option is ok.


    In your case you copy all pages from all documents in pdfDocuments to a target document, so in particular you want the changes you applied also be copied to the target. Thus, the former option applies, you have to take the output of the PdfDocument with PdfWriter after applying the changes and use it as input of a PdfDocument without PdfWriter to copy from.

    You can do so by changing your addBlankPage like this:

    public PdfDocument addBlankPage(final MediaModel pdfDocument) throws IOException {
        try (   InputStream inputStream = mediaService.getStreamFromMedia(pdfDocument);
                PdfReader reader = new PdfReader(inputStream);
                PdfWriter writer = new PdfWriter(pdfDocument.getRealFileName());
                PdfDocument document = new PdfDocument(reader, writer)) {
            document.addNewPage(document.getFirstPage().getPageSize());
        }
        return new PdfDocument(new PdfReader(pdfDocument.getRealFileName()));
    }
    

    or if you don't actually want to write the PDF into the file system:

    public PdfDocument addBlankPage(final MediaModel pdfDocument) throws IOException {
        ByteArrayOutputStream baos = new ByteArrayOutputStream();
        try (   InputStream inputStream = mediaService.getStreamFromMedia(pdfDocument);
                PdfReader reader = new PdfReader(inputStream);
                PdfWriter writer = new PdfWriter(baos);
                PdfDocument document = new PdfDocument(reader, writer)) {
            document.addNewPage(document.getFirstPage().getPageSize());
        }
        return new PdfDocument(new PdfReader(new ByteArrayInputStream(baos.toByteArray())));
    }