Search code examples
c#pdftableofcontentsgembox-pdf

Merge PDF files with TOC element


I'm merging PDF files using GemBox.Pdf as shown here. This works great and I can easily add outlines.

I've previously done a similar thing and merged Word files with GemBox.Document as shown here.

But now my problem is that there is no TOC element in GemBox.Pdf. I want to get automatically a Table of Contents while merging multiple PDF files into one.

Am I missing something or is there really no such element for PDF?
Do I need to recreate it, if yes then how would I do that?
I can add a bookmark, but I don't know how to add a link to it.


Solution

  • EDIT:

    There is a simpler code available on Create Table of Contents in PDF example. The concept is the same but it's using PdfLinkAnnotation objects which simplifies updating TOC links.

    ORIGINAL:

    There is no such element in PDF files, so we need to create this content ourselves.

    Now one way would be to create text elements, outlines, and link annotations, position them appropriately, and set the link destinations to outlines.

    However, this could be quite some work so perhaps it would be easier to just create the desired TOC element with GemBox.Document, save it as a PDF file, and then import it into the resulting PDF.

    // Source data for creating TOC entries with specified text and associated PDF files.
    var pdfEntries = new[]
    {
        new { Title = "First Document Title", Pdf = PdfDocument.Load("input1.pdf") },
        new { Title = "Second Document Title", Pdf = PdfDocument.Load("input2.pdf") },
        new { Title = "Third Document Title", Pdf = PdfDocument.Load("input3.pdf") },
    };
    
    /***************************************************************/
    /* Create new document with TOC element using GemBox.Document. */
    /***************************************************************/
    
    // Create new document.
    var tocDocument = new DocumentModel();
    var section = new Section(tocDocument);
    tocDocument.Sections.Add(section);
    
    // Create and add TOC element.
    var toc = new TableOfEntries(tocDocument, FieldType.TOC);
    section.Blocks.Add(toc);
    section.Blocks.Add(new Paragraph(tocDocument, new SpecialCharacter(tocDocument, SpecialCharacterType.PageBreak)));
    
    // Create heading style.
    // By default, when updating TOC element a TOC entry is created for each paragraph that has heading style.
    var heading1Style = (ParagraphStyle)tocDocument.Styles.GetOrAdd(StyleTemplateType.Heading1);
    
    // Add heading and empty (placeholder) pages.
    // The number of added placeholder pages depend on the number of pages that actual PDF file has so that TOC entries have correct page numbers.
    int totalPageCount = 0;
    foreach (var pdfEntry in pdfEntries)
    {
        section.Blocks.Add(new Paragraph(tocDocument, pdfEntry.Title) { ParagraphFormat = { Style = heading1Style } });
        section.Blocks.Add(new Paragraph(tocDocument, new SpecialCharacter(tocDocument, SpecialCharacterType.PageBreak)));
    
        int currentPageCount = pdfEntry.Pdf.Pages.Count;
        totalPageCount += currentPageCount;
    
        while (--currentPageCount > 0)
            section.Blocks.Add(new Paragraph(tocDocument, new SpecialCharacter(tocDocument, SpecialCharacterType.PageBreak)));
    }
    
    // Remove last extra-added empty page.
    section.Blocks.RemoveAt(section.Blocks.Count - 1);
    
    // Update TOC element and save the document as PDF stream.
    toc.Update();
    var pdfStream = new MemoryStream();
    tocDocument.Save(pdfStream, new GemBox.Document.PdfSaveOptions());
    
    /***************************************************************/
    /* Merge PDF files into PDF with TOC element using GemBox.Pdf. */
    /***************************************************************/
    
    // Load a PDF stream using GemBox.Pdf.
    var pdfDocument = PdfDocument.Load(pdfStream);
    var rootDictionary = (PdfDictionary)((PdfIndirectObject)pdfDocument.GetDictionary()[PdfName.Create("Root")]).Value;
    var pagesDictionary = (PdfDictionary)((PdfIndirectObject)rootDictionary[PdfName.Create("Pages")]).Value;
    var kidsArray = (PdfArray)pagesDictionary[PdfName.Create("Kids")];
    var pageIds = kidsArray.Cast<PdfIndirectObject>().Select(obj => obj.Id).ToArray();
    
    // Remove empty (placeholder) pages.
    while (totalPageCount-- > 0)
        pdfDocument.Pages.RemoveAt(pdfDocument.Pages.Count - 1);
    
    // Add pages from PDF files.
    foreach (var pdfEntry in pdfEntries)
        foreach (var page in pdfEntry.Pdf.Pages)
            pdfDocument.Pages.AddClone(page);
    
    /*****************************************************************************/
    /* Update TOC links from placeholder pages to actual pages using GemBox.Pdf. */
    /*****************************************************************************/
    
    // Create a mapping from an ID of a empty (placeholder) page indirect object to an actual page indirect object.
    var pageCloneMap = new Dictionary<PdfIndirectObjectIdentifier, PdfIndirectObject>();
    for (int i = 0; i < kidsArray.Count; ++i)
        pageCloneMap.Add(pageIds[i], (PdfIndirectObject)kidsArray[i]);
    
    foreach (var entry in pageCloneMap)
    {
        // If page was updated, it means that we passed TOC pages, so break from the loop.
        if (entry.Key != entry.Value.Id)
            break;
    
        // For each TOC page, get its 'Annots' entry.
        // For each link annotation from the 'Annots' get the 'Dest' entry.
        // Update the first item in the 'Dest' array so that it no longer points to a removed page.
        if (((PdfDictionary)entry.Value.Value).TryGetValue(PdfName.Create("Annots"), out PdfBasicObject annotsObj))
            foreach (PdfIndirectObject annotObj in (PdfArray)annotsObj)
                if (((PdfDictionary)annotObj.Value).TryGetValue(PdfName.Create("Dest"), out PdfBasicObject destObj))
                {
                    var destArray = (PdfArray)destObj;
                    destArray[0] = pageCloneMap[((PdfIndirectObject)destArray[0]).Id];
                }
    }
    
    // Save resulting PDF file.
    pdfDocument.Save("Result.pdf");
    pdfDocument.Close();
    

    This way you can easily customize the TOC element by using the TOC switches and styles. For more info, see the Table Of Content example from GemBox.Document.