Search code examples
c#pdfitext7

How to merge all pdf files from a PDF Portfolio to a normal pdf file using C# iText7?


I took this C# example and tried to get the attachments as a PdfDocument, but I couldn't figure out how to do it.

In the end I would like to simply merge every pdf file contained in a portfolio into a single "normal" pdf file. Every non-pdf attachment should be ignored.

Edit:

(Okay, sorry for being too vague. By saying what I want to achieve, I simply wanted to make it easier for you guys to help me. I did not want to make you write the program for me.)

So, here's part of the code from the linked example:

protected void ManipulatePdf(String dest)
{
    PdfDocument pdfDoc = new PdfDocument(new PdfReader(SRC), new PdfWriter(dest));

    PdfDictionary root = pdfDoc.GetCatalog().GetPdfObject();
    PdfDictionary names = root.GetAsDictionary(PdfName.Names);
    PdfDictionary embeddedFiles = names.GetAsDictionary(PdfName.EmbeddedFiles);
    PdfArray namesArray = embeddedFiles.GetAsArray(PdfName.Names);
    
    // Remove the description of the embedded file
    namesArray.Remove(0);

    // Remove the reference to the embedded file.
    namesArray.Remove(0);

    pdfDoc.Close();
}

Instead of removing anything from the source document, I would like to know how to get the PdfDocument object(s) out of the PdfArray if possible.

Sample file: http://www.mediafire.com/file/c4tw07wci8swdx9/NPort_5000.pdf/file

Solution by mkl ported to C#:

PdfNameTree embeddedFilesTree = pdfDocument.GetCatalog().GetNameTree(PdfName.EmbeddedFiles);
IDictionary<string, PdfObject> embeddedFilesMap = embeddedFilesTree.GetNames();
List<PdfStream> embeddedPdfs = new List<PdfStream>();
foreach (PdfObject pdfObject in embeddedFilesMap.Values)
{
    if (!(pdfObject is PdfDictionary))
        continue;
    PdfDictionary filespecDict = (PdfDictionary)pdfObject;
    PdfDictionary embeddedFileDict = filespecDict.GetAsDictionary(PdfName.EF);
    if (embeddedFileDict == null)
        continue;
    PdfStream embeddedFileStream = embeddedFileDict.GetAsStream(PdfName.F);
    if (embeddedFileStream == null)
        continue;
    PdfName subtype = embeddedFileStream.GetAsName(PdfName.Subtype);
    if (PdfName.ApplicationPdf.CompareTo(subtype) != 0)
        continue;
    embeddedPdfs.Add(embeddedFileStream);
}

if (embeddedPdfs.Count > 0)
{
    PdfWriter pdfWriter = new PdfWriter("NPort_5000-flat.pdf", new WriterProperties().SetFullCompressionMode(true));
    PdfDocument flatPdfDocument = new PdfDocument(pdfWriter);
    PdfMerger pdfMerger = new PdfMerger(flatPdfDocument);
    RandomAccessSourceFactory sourceFactory = new RandomAccessSourceFactory();
    foreach (PdfStream pdfStream in embeddedPdfs)
    {
        PdfReader embeddedReader = new PdfReader(sourceFactory.CreateSource(pdfStream.GetBytes()), new ReaderProperties());
        PdfDocument embeddedPdfDocument = new PdfDocument(embeddedReader);
        pdfMerger.Merge(embeddedPdfDocument, 1, embeddedPdfDocument.GetNumberOfPages());
    }
    flatPdfDocument.Close();
}

Solution

  • To merge all pdf files from a PDF Portfolio to a normal pdf file you have to walk the name tree of EmbeddedFiles, retrieve the streams of all PDFs therein, and then merge all these PDFs.

    You can do this as follows for a portfolio loaded in a PdfDocument pdfDocument (Java version; the OP edited a port to C# into his question body):

    PdfNameTree embeddedFilesTree = pdfDocument.getCatalog().getNameTree(PdfName.EmbeddedFiles);
    Map<String, PdfObject> embeddedFilesMap = embeddedFilesTree.getNames();
    List<PdfStream> embeddedPdfs = new ArrayList<PdfStream>();
    for (Map.Entry<String, PdfObject> entry : embeddedFilesMap.entrySet()) {
        PdfObject pdfObject = entry.getValue();
        if (!(pdfObject instanceof PdfDictionary))
            continue;
        PdfDictionary filespecDict = (PdfDictionary) pdfObject;
        PdfDictionary embeddedFileDict = filespecDict.getAsDictionary(PdfName.EF);
        if (embeddedFileDict == null)
            continue;
        PdfStream embeddedFileStream = embeddedFileDict.getAsStream(PdfName.F);
        if (embeddedFileStream == null)
            continue;
        PdfName subtype = embeddedFileStream.getAsName(PdfName.Subtype);
        if (!PdfName.ApplicationPdf.equals(subtype))
            continue;
        embeddedPdfs.add(embeddedFileStream);
    }
    
    Assert.assertFalse("No embedded PDFs found", embeddedPdfs.isEmpty());
    
    try (   PdfWriter pdfWriter = new PdfWriter("NPort_5000-flat.pdf", new WriterProperties().setFullCompressionMode(true));
            PdfDocument flatPdfDocument = new PdfDocument(pdfWriter)    ) {
        PdfMerger pdfMerger = new PdfMerger(flatPdfDocument);
        RandomAccessSourceFactory sourceFactory = new RandomAccessSourceFactory();
        for (PdfStream pdfStream : embeddedPdfs) {
            try (   PdfReader embeddedReader = new PdfReader(sourceFactory.createSource(pdfStream.getBytes()), new ReaderProperties());
                    PdfDocument embeddedPdfDocument = new PdfDocument(embeddedReader)) {
                pdfMerger.merge(embeddedPdfDocument, 1, embeddedPdfDocument.getNumberOfPages());
            }
        }
    }
    

    (FlattenPortfolio test testFlattenNPort_5000)