Search code examples
c#html-agility-pack

HTML Agility Pack wrap LI items with UL if required


Trying to resolve ill HTML markup.

Let's say I've got the following markup:

<li>Foo</li>
<li>Bar</li>

or

<li>Foo</li>
<li>Bar</li>
</ul>

or

<ul>
<li>Foo</li>
<li>Bar</li>

Also, there might be some text before or after the list.

What I've tried:

HtmlNode firstLiNode = doc.DocumentNode.ChildNodes.FirstOrDefault(n => n.Name.Equals("li"));
if (firstLiNode != null &&
    (firstLiNode.PreviousSibling == null || !firstLiNode.PreviousSibling.Name.Equals("ul")))
{
    doc.DocumentNode.InsertBefore(HtmlNode.CreateNode("<ul>"), firstLiNode);
}

Which in my mind should just add <ul> tag before first <li> tag. Following the same logic I could insert </ul> that at the end of the list if needed, hoever, what I am getting instead is <ul></ul><li>Foo</li><li>Bar</li> without even trying to insert the closing ul tag.

Question: What am I doing wrong?


Solution

  • My solution was the following:

    Stripping all UL tags, then inserting new one if needed as follows:

    HtmlNode firstLiNode = pos.Nodes.FirstOrDefault(n => n.Name.Equals("li"));
    if (firstLiNode != null)
    {
        // Retrieve all LI nodes that should be wrapped with the UL tag.
        IEnumerable<HtmlNode> liNodes = doc.DocumentNode.SelectNodes(@"//li");
        HtmlNode ulNode = HtmlNode.CreateNode("<ul>");
    
        // Insert LI tags into newly created UL tag.
        foreach (HtmlNode liNode in liNodes)
        {
            HtmlNode clone = liNode.CloneNode(true);
            ulNode.AppendChild(clone);
        }
    
        // Insert newly created UL tag with child LI nodes before "original" LI tag without UL tag.
        doc.DocumentNode.InsertBefore(ulNode, firstLiNode);
    
        // Remove LI tags that are not wrapped with UL tag.
        foreach (HtmlNode liNode in liNodes)
        {
            doc.DocumentNode.RemoveChild(liNode);
        }
    }