I want to be able to display a list of entity names and values in a C#/.NET 4.0 application.
I am able to retrieve the entity names easily enough using XmlDocument.DocumentType.Entities
, but is there a good way to retrieve the values of those entities?
I noticed that I can retrieve the value for text only entities using InnerText
, but this doesn't work for entities that contain XML tags.
Is the best way to resort to a regex?
Let's say that I have a document like this:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE document [
<!ENTITY test "<para>only a test</para>">
<!ENTITY wwwc "World Wide Web Corporation">
<!ENTITY copy "©">
]>
<document>
<!-- The following image is the World Wide Web Corporation logo. -->
<graphics image="logo" alternative="&wwwc; Logo"/>
</document>
I want to present a list to the user containing the three entity names (test, wwwc, and copy), along with their values (the text in quotes following the name). I had not thought through the question of entities nested within other entities, so I would be interested in a solution that either completely expands the entity values or shows the text just as it is in the quotes.
Although it’s not likely the most elegant solution possible, I came up with something that seems to work well enough for my purposes. First, I parsed the original document and retrieved the entity nodes from that document. Then I created a small in-memory XML document, to which I added all the entity nodes. Next, I added entity references to all of the entities within the temporary XML. Finally, I retrieved the InnerXml from all of the references.
Here's some sample code:
// parse the original document and retrieve its entities
XmlDocument parsedXmlDocument = new XmlDocument();
XmlUrlResolver resolver = new XmlUrlResolver();
resolver.Credentials = CredentialCache.DefaultCredentials;
parsedXmlDocument.XmlResolver = resolver;
parsedXmlDocument.Load(path);
// create a temporary xml document with all the entities and add references to them
// the references can then be used to retrieve the value for each entity
XmlDocument entitiesXmlDocument = new XmlDocument();
XmlDeclaration dec = entitiesXmlDocument.CreateXmlDeclaration("1.0", null, null);
entitiesXmlDocument.AppendChild(dec);
XmlDocumentType newDocType = entitiesXmlDocument.CreateDocumentType(parsedXmlDocument.DocumentType.Name, parsedXmlDocument.DocumentType.PublicId, parsedXmlDocument.DocumentType.SystemId, parsedXmlDocument.DocumentType.InternalSubset);
entitiesXmlDocument.AppendChild(newDocType);
XmlElement root = entitiesXmlDocument.CreateElement("xmlEntitiesDoc");
entitiesXmlDocument.AppendChild(root);
XmlNamedNodeMap entitiesMap = entitiesXmlDocument.DocumentType.Entities;
// build a dictionary of entity names and values
Dictionary<string, string> entitiesDictionary = new Dictionary<string, string>();
for (int i = 0; i < entitiesMap.Count; i++)
{
XmlElement entityElement = entitiesXmlDocument.CreateElement(entitiesMap.Item(i).Name);
XmlEntityReference entityRefElement = entitiesXmlDocument.CreateEntityReference(entitiesMap.Item(i).Name);
entityElement.AppendChild(entityRefElement);
root.AppendChild(entityElement);
if (!string.IsNullOrEmpty(entityElement.ChildNodes[0].InnerXml))
{
// do not add parameter entities or invalid entities
// this can be determined by checking for an empty string
entitiesDictionary.Add(entitiesMap.Item(i).Name, entityElement.ChildNodes[0].InnerXml);
}
}