Search code examples
c#itext7

ITEXT 7 Accessibility check Alternate Text


How can I get the Accessibility, Alternate Text properties from a pdf using Itext 7 similar to what is displayed using Accessiblitiy tools link Adobe Pro?


Solution

  • You can access that information via the tag tree.

    Set up your PDF document normally. You will need a new tag tree pointer.

    string input = "input.pdf";
    PdfReader reader = new(input);
    
    PdfDocument document = new(reader);
    Document pdf = new(document);
    
    // This code is usually in its own method so we can get as many as we need
    // Get the tag pointer at the root tag
    TagTreePointer original = pdf.GetPdfDocument().GetTagStructureContext().GetAutoTaggingPointer().MoveToRoot();
    // Get the element for the pointer. We use this to make a new pointer
    PdfStructElem elem = original.GetContext().GetPointerStructElem(original);
    // Return a new pointer using the element
    var pointer = original.GetContext().CreatePointerForStructElem(elem);
    

    You can reuse a pointer, but I find it helpful to get a new one for each operation.

    Next, move to the tag you want to examine. For simplicity we'll use the first child.

    // Move to first child (assuming one exists)
    pointer.MoveToKid(0);
    

    Now we can view the tag's properties.

    // view data
    pointer.GetProperties().GetActualText();
    pointer.GetProperties().GetAlternateDescription();
    pointer.GetProperties().GetStructureElementId();
    // title is a bit more complicated. We have to use the low-level pdf struct and view the T (title) attribute
    pointer.GetContext().GetPointerStructElem(pointer).GetPdfObject().GetAsString(PdfName.T);
    

    You will want to write code to handle null values and tags.