Trying to resolve ill HTML markup.
Let's say I've got the following markup:
<li>Foo</li>
<li>Bar</li>
or
<li>Foo</li>
<li>Bar</li>
</ul>
or
<ul>
<li>Foo</li>
<li>Bar</li>
Also, there might be some text before or after the list.
What I've tried:
HtmlNode firstLiNode = doc.DocumentNode.ChildNodes.FirstOrDefault(n => n.Name.Equals("li"));
if (firstLiNode != null &&
(firstLiNode.PreviousSibling == null || !firstLiNode.PreviousSibling.Name.Equals("ul")))
{
doc.DocumentNode.InsertBefore(HtmlNode.CreateNode("<ul>"), firstLiNode);
}
Which in my mind should just add <ul>
tag before first <li>
tag. Following the same logic I could insert </ul>
that at the end of the list if needed, hoever, what I am getting instead is <ul></ul><li>Foo</li><li>Bar</li>
without even trying to insert the closing ul
tag.
Question: What am I doing wrong?
My solution was the following:
Stripping all UL tags, then inserting new one if needed as follows:
HtmlNode firstLiNode = pos.Nodes.FirstOrDefault(n => n.Name.Equals("li"));
if (firstLiNode != null)
{
// Retrieve all LI nodes that should be wrapped with the UL tag.
IEnumerable<HtmlNode> liNodes = doc.DocumentNode.SelectNodes(@"//li");
HtmlNode ulNode = HtmlNode.CreateNode("<ul>");
// Insert LI tags into newly created UL tag.
foreach (HtmlNode liNode in liNodes)
{
HtmlNode clone = liNode.CloneNode(true);
ulNode.AppendChild(clone);
}
// Insert newly created UL tag with child LI nodes before "original" LI tag without UL tag.
doc.DocumentNode.InsertBefore(ulNode, firstLiNode);
// Remove LI tags that are not wrapped with UL tag.
foreach (HtmlNode liNode in liNodes)
{
doc.DocumentNode.RemoveChild(liNode);
}
}