In an XmlDocument, either when writing and modify later, is it possible to remove the self-closing tags (i.e. />
) for a certain element.
For example: change
<img />
or <img></img>
to <img>
.<br />
to <br>
. Why you ask? I'm trying to conform to the HTML for Word 2007 schema; the resulting HTML will be displayed in Microsoft Outlook 2007 or later.
After reading another StackOverflow question, I tried the setting the IsEmpty
property to false
like so.
var imgElements = finalHtmlDoc.SelectNodes("//*[local-name()=\"img\"]").OfType<XmlElement>();
foreach (var element in imgElements)
{
element.IsEmpty = false;
}
However that resulted in <img />
becoming <img></img>
. Also, as a hack I also tried changing the OuterXml
property directly however that doesn't work (didn't expect it to).
Question
Can you remove the self-closing tags from XmlDocument
? I honestly do not think there is, as it would then be invalid xml (no closing tag), however thought I would throw the question out the community.
Update:
I ended up fixing the HTML string after exporting from the XmlDocument
using a regular expression (written in the wonderful RegexBuddy).
var fixHtmlRegex = new Regex("<(?<tag>meta|img|br)(?<attributes>.*?)/>", RegexOptions.IgnoreCase | RegexOptions.Multiline);
return fixHtmlRegex.Replace(htmlStringBuilder.ToString(), "<$1$2>");
It cleared many errors from the validation pass and allow me to focus on the real compatibility problems.
You're right: it's not possible simply because it's invalid (or rather, not well-formed) XML. Empty elements in XML must be closed, be it with the shortcut syntax />
or with an immediate closing tag.