My original text is something like this:
<p>
Lorem <span><i></i>ipsum<span> <a href="">dolor</a> sit <a href="link.html">amet</a>, consectetur adipiscing <span>elit</span>.
</p>
I am trying to keep only text + A elements, so the output should be something like this:
Lorem ipsum <a href="">dolor</a> sit <a href="link.html">amet</a>, consectetur adipiscing elit.
Both
htmlDoc.DocumentNode.SelectSingleNode("//p").InnerText;
and
htmlDoc.DocumentNode.SelectSingleNode("//p").InnerHtml;
are not working for this case. How can I achieve that?
I've achieved that with regex, I hope it will help someone in the future:
var output = Regex.Replace(input, @"<(?!\/?a(?=>|\s.*>))\/?.*?>", string.Empty);
Don't forget to add
using System.Text.RegularExpressions;