Search code examples

Split PDF by chapters from Table Of Contents

I'm using GemBox.Pdf and I need to extract individual chapters in a PDF file as a separate PDF files.

The first page (maybe the second page as well) contains TOC (Table Of Contents) and I need to split the rest of the PDF pages based on it:

PDF file with Chapters and Table Of Contents

Also, those PDF documents that are split, should be named as the chapters they contains.
I can split the PDF based on the number of pages for each document (I figured that out using this example):

using (var source = PdfDocument.Load("Chapters.pdf"))
    int pagesPerSplit = 3;
    int count = source.Pages.Count;

    for (int index = 1; index < count; index += pagesPerSplit)
        using (var destination = new PdfDocument())
            for (int splitIndex = 0; splitIndex < pagesPerSplit; splitIndex++)
                destination.Pages.AddClone(source.Pages[index + splitIndex]);

            destination.Save("Chapter " + index + ".pdf");

But I can't figure out how to read and process that TOC and incorporate the chapters splitting base on its items.


  • EDIT:

    On that same page that you linked, there is now Split PDF file by bookmarks (outlines) example.


    You should iterate through the document's bookmarks (outlines) and split it based on the bookmark destination pages.

    For instance, try this:

    using (var source = PdfDocument.Load("Chapters.pdf"))
        PdfOutlineCollection outlines = source.Outlines;
        PdfPages pages = source.Pages;
        Dictionary<PdfPage, int> pageIndexes = pages
            .Select((page, index) => new { page, index })
            .ToDictionary(item =>, item => item.index);
        for (int index = 0, count = outlines.Count; index < count; ++index)
            PdfOutline outline = outlines[index];
            PdfOutline nextOutline = index + 1 < count ? outlines[index + 1] : null;
            int pageStartIndex = pageIndexes[outline.Destination.Page];
            int pageEndIndex = nextOutline != null ?
                pageIndexes[nextOutline.Destination.Page] :
            using (var destination = new PdfDocument())
                while (pageStartIndex < pageEndIndex)

    Note, from the screenshot it seems that your chapter bookmarks include the order's number (roman numerals). If needed, you can easily remove those with something like this:

    destination.Save($"{outline.Title.Substring(outline.Title.IndexOf(' ') + 1)}.pdf");