I trying to use the iText7
library to extract some pages from a PDF
file to create a new one.
static void Splitter()
{
string file = @"C:\Users\Standard\Downloads\Merged\CK 2002989 $29,514.42 02.12.20.pdf";
string range = "1, 4, 8";
var pdfDocumentInvoiceNumber = new PdfDocument(new PdfReader(file));
var split = new PdfSplitter(pdfDocumentInvoiceNumber);
var result = split.ExtractPageRange(new PageRange(range));
var numberOfPagesPdfDocumentInvoiceNumber = result.GetNumberOfPages();
String toFile = @"C:\Users\Standard\Downloads\Result\Extracted.pdf";
var pdfWriter = new PdfWriter(toFile);
var pdfDocumentInvoiceMergeResult = new PdfDocument(pdfWriter);
for (var i = 1; i <= numberOfPagesPdfDocumentInvoiceNumber; i++)
{
var pdfPage = result.GetPage(i).CopyTo(pdfDocumentInvoiceMergeResult);
pdfDocumentInvoiceMergeResult.AddPage(pdfPage);
}
}
But when I attempt to use CopyTo
method I get the error
iText.Kernel.PdfException: 'Cannot copy indirect object from the document that is being written.'
The problem here is that the documents returned by the PdfSplitter
methods, in particular by ExtractPageRange
, are iText 7 documents written to, i.e. these PdfDocument
instances have been instantiated with a PdfWriter
.
Such documents are subject to certain restrictions, in particular that pages cannot be copied from them. For details on this read the answers here and here.
To make these result documents (and the whole PdfSplitter
class with them) be of any value, therefore, you need a way to define where the PdfWriter
objects of these documents write to. And there is a way, albeit not really an intuitive way: You have to overwrite the GetNextPdfWriter
method of the PdfSplitter
which originally looks like this:
/// <summary>This method is called when another split document is to be created.</summary>
/// <remarks>
/// This method is called when another split document is to be created.
/// You can override this method and return your own
/// <see cref="iText.Kernel.Pdf.PdfWriter"/>
/// depending on your needs.
/// </remarks>
/// <param name="documentPageRange">the page range of the original document to be included in the document being created now.
/// </param>
/// <returns>the PdfWriter instance for the document which is being created.</returns>
protected internal virtual PdfWriter GetNextPdfWriter(PageRange documentPageRange) {
return new PdfWriter(new ByteArrayOutputStream());
}
In a use case like yours in which you merely expect a single return document you eventually want to write to a file, you can do so like this:
class MySplitter : PdfSplitter
{
public MySplitter(PdfDocument pdfDocument) : base(pdfDocument)
{
}
protected override PdfWriter GetNextPdfWriter(PageRange documentPageRange)
{
String toFile = @"C:\Users\Standard\Downloads\Result\Extracted.pdf";
return new PdfWriter(toFile);
}
}
With the PdfWriter
instantiation moved into that custom splitter your main code is reduced to
string file = @"C:\Users\Standard\Downloads\Merged\CK 2002989 $29,514.42 02.12.20.pdf";
string range = "1, 4, 8";
var pdfDocumentInvoiceNumber = new PdfDocument(new PdfReader(file));
var split = new MySplitter(pdfDocumentInvoiceNumber);
var result = split.ExtractPageRange(new PageRange(range));
result.Close();
In a use case like yours this admittedly looks weird, having to derive a custom class from the PdfSplitter
merely to extract a few pages from a source PDF to a result PDF. Wouldn't an additional PdfWriter
parameter to the ExtractPageRange
have made it much easier?
Please be aware, though, that the main objective of the PdfSplitter
class is to split documents into many parts using the ExtractPageRanges
and SplitBy...
methods, and in that situation you'd need to supply a larger, probably not exactly known number of PdfWriters
... not easier at all!
Of course, a better solution probably would have been injecting some lambda expression or some other callback mechanism. For example:
class ImprovedSplitter : PdfSplitter
{
private Func<PageRange, PdfWriter> nextWriter;
public ImprovedSplitter(PdfDocument pdfDocument, Func<PageRange, PdfWriter> nextWriter) : base(pdfDocument)
{
this.nextWriter = nextWriter;
}
protected override PdfWriter GetNextPdfWriter(PageRange documentPageRange)
{
return nextWriter.Invoke(documentPageRange);
}
}
you can use like this
string file = @"C:\Users\Standard\Downloads\Merged\CK 2002989 $29,514.42 02.12.20.pdf";
string range = "1, 4, 8";
var pdfDocumentInvoiceNumber = new PdfDocument(new PdfReader(file));
var split = new ImprovedSplitter(pdfDocumentInvoiceNumber, pageRange => new PdfWriter(@"C:\Users\Standard\Downloads\Result\Extracted.pdf"));
var result = split.ExtractPageRange(new PageRange(range));
result.Close();