Search code examples
c#.netpdfitexttagging

Tagging individual pages of a PDF with ItextSharp C#


I am currently working with ITEXTSHARP 5.5.6.0

My goal is to add a Key to each page and have those persistent when I read the document again with another application. I want to be able to keep track of every page individually (the key is unique, and comes from another source).

This is my import/write code:

 using (PdfReader reader = new PdfReader(sourcePdfPath))
 {

        using (Document document = new Document(reader.GetPageSizeWithRotation(pageNumber)))
        {

            PdfCopy pdfCopyProvider = new PdfCopy(document, new System.IO.FileStream(outputPdfPath, System.IO.FileMode.Create));
            pdfCopyProvider.SetTagged();
            pdfCopyProvider.PdfVersion = PdfWriter.VERSION_1_7;

            PdfImportedPage importedPage = pdfCopyProvider.GetImportedPage(reader, pageNumber, true);
            importedPage.SetAccessibleAttribute(PdfName.ALT, new PdfString("MYKEY"));
            pdfCopyProvider.AddPage(importedPage);               
        }
 }

This is my read code:

using (MemoryStream ms = new MemoryStream())
        {
            Document document = new Document();
            PdfCopy copy = new PdfCopy(document, ms);
            copy.SetTagged();
            document.Open();
            for (int i = 0; i < pdfs.Count; ++i)
            {
                var pdf = File.ReadAllBytes(pdfs[i]);
                PdfReader reader = new PdfReader(pdf);
                int n = reader.NumberOfPages;
                for (int page = 0; page < n; )
                {
                    var importPage = copy.GetImportedPage(reader, ++page, true);
                    var MyKey = importPage.GetAccessibleAttribute(PdfName.ALT);
                    if (MyKey != null)
                        //Do Something with KEY
                    copy.AddPage(importPage);
                }
            }
            document.Close();
            copy.Close();


            return ms.ToArray();
        }

I am trying to add an accessibility ALT text. Currently, I use that attribute on images, and all applications are set to leave those attributes untouched.

The problem is that when I add the attribute this way, save it to a PDF file, and then read it on another process, the attribute is no longer there.

I am open to other options, to resolve the problem of having a primary key per page, that i can assign, read and remove

I am trying to avoid adding a hidden field on each page.


Solution

  • I have little experience with iText programming or with c# so I'm ideal to answer your question :)

    First of all, if all you want to do is mark a page and afterwards find it again, please do not use the accessibility features in the PDF. Accessibility is there for assistive devices, abusing those features isn't nice.

    Especially because - if I understand correctly what you want to do - there is no need to do so. If you want to mark a page, you should look for the page dictionary, for example:

    PdfReader reader = new iTextSharp.text.pdf.PdfReader(file_content);
    PdfDictionary pageDict = reader.GetPageN(i);
    

    Copied from: http://goobbe.com/questions/8099416/how-to-get-the-userunit-property-from-a-pdffile-using-itextsharp-pdfreader

    Once you have that dict, you can insert your own private key in there:

    public void put(PdfName key, PdfObject object);
    

    The value you assign is up to you, but if you want to follow the rules, you have to use a second class PDF name as the key. This is a key that consists of your developer prefix - which should be registered so it is unique and a private part. For example a key could look like:

    FICL:PageNumber
    

    In that case "FICL" is your developer prefix and "PageNumber" is your identification of the data you are adding.

    To register a developer prefix, see the Adobe web site, for example here: http://www.adobe.com/content/dam/Adobe/en/devnet/acrobat/pdfs/pdfregistry_v3.pdf

    Hope this helps.

    PS: If anyone here knows who actually owns the "FICL" prefix and where the letters come from, I'll buy you a beer :)