I am looking into converting some Markdown text to plain text. After reading existing questions its apparent that the easiest solution would be to convert Markdown to Html with an existing converter then Html to plain text. However i am still a little baffled as i need to retain the a tag href that comes from the html.
E.g. This markdown "some text [click here](https://somelink.com)" gets converted to html
<p>some text <a href="https://somelink.com">click here</a></p>
then when i convert that html to plain text its "some text click here"
How can i convert the orginal markdown to something like "some text https://somelink.com"
Following on from the answer by Judah Gabriel Himango here i made changes to the method that steps through the html elements.
I added the switch case for the A tag to get the attributes value and also set a flag to stop the method iterating through the a tags children as its the href that is important in my case.
case HtmlNodeType.Element:
switch (node.Name)
{
case "p":
// treat paragraphs as crlf
outText.Write("\r\n");
break;
case "br":
outText.Write("\r\n");
break;
case "a":
outText.Write($"{node.Attributes.FirstOrDefault(x => x.Name == "href")?.Value}");
isATag = true;
break;
}
if (node.HasChildNodes && !isATag)
{
ConvertContentTo(node, outText);
}
break;