Search code examples
c#pdfitexttableofcontents

iTextSharp get Actions from Table Of Content PDF C#


I have a PDF with a TOC:

Table of Contents

Using the iTextSharp.dll I'm trying to get the annotations then the actions on these annotations. Then I want to manipulate/change the link to point to another page. For example if Chapter 1 in the TOC points to page 5 I would like it to point to page 2 when I click on the link. For some reason the action on the annotation is null and therefore I can not manipulate this data. The code below works but keeps providing a null action. I don't understand why that is. To reproduce the pdf in question

  • create a word document with 3 pages
  • Page 1 will be the table of contents, Page 2 Chapter 1, Page 3 Chapter 2
  • export to PDF
  • Once you have the PDF the TOC should be 'clickable'.

I would then like to be able to manipulate where it clicks to. Thank you.

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
using iTextSharp.text;
using iTextSharp.text.pdf;
using System.IO;
using System.Collections;

namespace PDFLinks
{
    class Program
    {
        //Folder that we are working in
        //private static readonly string WorkingFolder = Path.Combine(Environment.GetFolderPath(Environment.SpecialFolder.Desktop), "Hyperlinked PDFs");
        //Sample PDF
        private static readonly string BaseFile = Path.Combine("C:\\Temp", "TableOfContentsTest.pdf");
        //Final file
        private static readonly string OutputFile = Path.Combine("C:\\Temp", "NewFile.pdf");

        static void Main(string[] args)
        {
            //Setup some variables to be used later
            PdfReader R = default(PdfReader);
            int PageCount = 0;

            //Open our reader
            R = new PdfReader(BaseFile);
            //Get the page cont
            PageCount = R.NumberOfPages;
            Console.WriteLine("Page Count= " + PageCount);

            //Loop through each page
            //for (int i = 1; i <= PageCount; i++)
            //{
                //Get the current page
                PdfDictionary PageDictionary = R.GetPageN(1);
                //Get all of the annotations for the current page
                PdfArray Annots = PageDictionary.GetAsArray(PdfName.ANNOTS);
                //Make sure we have something
                if ((Annots == null) || (Annots.Length == 0))
                {
                    Console.WriteLine("nothing");
                }

                //Loop through each annotation
                if (Annots != null)
                {
                    Console.WriteLine("ANNOTS Not Null" + Annots[0]);
                    foreach (PdfObject A in Annots.ArrayList)
                    {
                        //Convert the itext-specific object as a generic PDF object
                        PdfDictionary AnnotationDictionary = (PdfDictionary)PdfReader.GetPdfObject(A);                    
                        //Make sure this annotation has a link
                        if (!AnnotationDictionary.Get(PdfName.SUBTYPE).Equals(PdfName.LINK))
                            continue;
                        //Make sure this annotation has an ACTION
                        if (AnnotationDictionary.Get(PdfName.A) == null)
                            continue;
                        if (AnnotationDictionary.Get(PdfName.A) != null) 
                        {
                            Console.WriteLine("ACTION Not Null");
                        }
                        //Get the ACTION for the current annotation
                        PdfDictionary AnnotationAction = AnnotationDictionary.GetAsDict(PdfName.A);

                        // Test if it is a URI action (There are tons of other types of actions,
                        // some of which might mimic URI, such as JavaScript,
                        // but those need to be handled seperately)
                        if (AnnotationAction.Get(PdfName.S).Equals(PdfName.URI))
                        {
                            PdfString Destination = AnnotationAction.GetAsString(PdfName.URI);
                            string url1 = Destination.ToString();
                        }
                    }
                }
           //}
        }
    }
}

Solution

  • Destinations

    In your Link annotations you only look for an Action entry but there alternatively may be a Destination entry, cf. the PDF specification ISO 32000-2:

    A dictionary (Optional; PDF 1.1) An action that shall be performed when the link annotation is activated (see 12.6, "Actions").

    Dest array, name or byte string (Optional; not permitted if an A entry is present) A destination that shall be displayed when the annotation is activated (12.3.2, "Destinations").

    (ISO 32000-2 Table 176 — Additional entries specific to a link annotation)

    There are a number of types of destinations, cf. this answer, in particular the specification quote there, but the code there handling some of those types may also be of interest.

    Actions

    Even for Links with Actions, you only consider a) the first action and b) actions of type URI.

    Multiple actions

    Links can trigger a sequence of actions, the follow-up actions being referenced from the first action, cf. the specification

    Next dictionary or array (Optional; PDF 1.2) The next action or sequence of actions that shall be performed after the action represented by this dictionary. The value is either a single action dictionary or an array of action dictionaries that shall be performed in order; see Note 1 for further discussion.

    NOTE 1 The action dictionary’s Next entry (PDF 1.2) allows sequences of actions to be chained together. For example, the effect of clicking a link annotation with the mouse might be to play a sound, jump to a new page, and start up a movie. Note that the Next entry is not restricted to a single action but may contain an array of actions, each of which in turn may have a Next entry of its own. The actions may thus form a tree instead of a simple linked list.

    (ISO 32000-2 Table 196 — Entries common to all action dictionaries)

    As the example in the NOTE implies, the jump to a new page need not be the first action of a Link, so for your task you should definitively inspect all the actions of your Link.

    Action types

    A uniform resource identifier (URI) is a string that identifies (resolves to) a resource on the Internet — typically a file that is the destination of a hypertext link, although it may also resolve to a query or other entity. (URIs are described in Internet RFC 3986, Uniform Resource Identifiers (URI): Generic Syntax.)

    A URI action causes a URI to be resolved.

    (ISO 32000-2, section 12.6.4.8 URI actions)

    Thus, URI actions are quite unlikely to be found in the TOC of a PDF. You had better look for GoTo actions.

    A go-to action changes the view to a specified destination (page, location, and magnification factor). "Table 202 — Additional entries specific to a go-to action" shows the action dictionary entries specific to this type of action.

    NOTE Specifying a go-to action in the A entry of a link annotation or outline item (see "Table 176 — Additional entries specific to a link annotation" and "Table 151 — Entries in an outline item dictionary") has the same effect as specifying the destination directly with the Dest entry.

    (ISO 32000-2 section 12.6.4.2 Go-To actions)

    D name, byte string, or array (Required) The destination to jump to (see 12.3.2, "Destinations").

    (ISO 32000-2 Table 202 — Additional entries specific to a go-to action)

    When inspecting GoTo actions, therefore, you eventually have to deal with the same kind of target specifications as when inspecting immediate link destinations discussed at the top of the answer.