Search code examples
c#.netxmlxmlwriter

Remove self-closing tags (e.g. />) in an XmlDocument


In an XmlDocument, either when writing and modify later, is it possible to remove the self-closing tags (i.e. />) for a certain element.

For example: change

  • <img /> or <img></img> to <img>.
  • <br /> to <br>.

Why you ask? I'm trying to conform to the HTML for Word 2007 schema; the resulting HTML will be displayed in Microsoft Outlook 2007 or later.

After reading another StackOverflow question, I tried the setting the IsEmpty property to false like so.

var imgElements = finalHtmlDoc.SelectNodes("//*[local-name()=\"img\"]").OfType<XmlElement>();
foreach (var element in imgElements)
{
    element.IsEmpty = false;
}

However that resulted in <img /> becoming <img></img>. Also, as a hack I also tried changing the OuterXml property directly however that doesn't work (didn't expect it to).

Question

Can you remove the self-closing tags from XmlDocument? I honestly do not think there is, as it would then be invalid xml (no closing tag), however thought I would throw the question out the community.

Update:

I ended up fixing the HTML string after exporting from the XmlDocument using a regular expression (written in the wonderful RegexBuddy).

    var fixHtmlRegex = new Regex("<(?<tag>meta|img|br)(?<attributes>.*?)/>", RegexOptions.IgnoreCase | RegexOptions.Multiline);
    return fixHtmlRegex.Replace(htmlStringBuilder.ToString(), "<$1$2>");

It cleared many errors from the validation pass and allow me to focus on the real compatibility problems.


Solution

  • You're right: it's not possible simply because it's invalid (or rather, not well-formed) XML. Empty elements in XML must be closed, be it with the shortcut syntax /> or with an immediate closing tag.