Search code examples
c#.netitextitext7

iText Read HTML Tag when Converting HTML to PDF with ConvertToElements


I want to create a TOC with iText7 and my document is composed from multiple HTML strings. When I parse this strings with HtmlConverter.ConvertToElements I want to check foreach element if it's a H1 in order to add that to TOC but I'm having a hard time to achive this. I am unable to get the role from DefaultAccessibilityProperties of an element.

The information that I want is here in tagProperties.role

enter image description here


Solution

  • I did it. Cast your element as a div an then get the tagname via Role.

    var div = element as Div;
    if (div.GetAccessibilityProperties().GetRole() == "H1") title = true;
    

    and bonus helper to get content from an element in order to populate TOC:

    private static string GetContent(IElement element)
    {
        var builder = new StringBuilder();
        if (element is Text) builder.Append(((Text)element).GetText());
        if (element is IAbstractElement)
            foreach (var child in (element as IAbstractElement).GetChildren())
                builder.Append(GetContent(child));
        return builder.ToString();
    }