Search code examples
c#.netxmlentity

How do i retrieve an XML entity value in C#?


I want to be able to display a list of entity names and values in a C#/.NET 4.0 application.

I am able to retrieve the entity names easily enough using XmlDocument.DocumentType.Entities, but is there a good way to retrieve the values of those entities?

I noticed that I can retrieve the value for text only entities using InnerText, but this doesn't work for entities that contain XML tags.

Is the best way to resort to a regex?

Let's say that I have a document like this:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE document [
  <!ENTITY test "<para>only a test</para>">
  <!ENTITY wwwc "World Wide Web Corporation">
  <!ENTITY copy "&#xA9;">
]>

<document>
  <!-- The following image is the World Wide Web Corporation logo. -->
  <graphics image="logo" alternative="&wwwc; Logo"/>
</document>

I want to present a list to the user containing the three entity names (test, wwwc, and copy), along with their values (the text in quotes following the name). I had not thought through the question of entities nested within other entities, so I would be interested in a solution that either completely expands the entity values or shows the text just as it is in the quotes.


Solution

  • Although it’s not likely the most elegant solution possible, I came up with something that seems to work well enough for my purposes. First, I parsed the original document and retrieved the entity nodes from that document. Then I created a small in-memory XML document, to which I added all the entity nodes. Next, I added entity references to all of the entities within the temporary XML. Finally, I retrieved the InnerXml from all of the references.

    Here's some sample code:

            // parse the original document and retrieve its entities
            XmlDocument parsedXmlDocument = new XmlDocument();
            XmlUrlResolver resolver = new XmlUrlResolver();
            resolver.Credentials = CredentialCache.DefaultCredentials;
            parsedXmlDocument.XmlResolver = resolver;
            parsedXmlDocument.Load(path);
    
            // create a temporary xml document with all the entities and add references to them
            // the references can then be used to retrieve the value for each entity
            XmlDocument entitiesXmlDocument = new XmlDocument();
            XmlDeclaration dec = entitiesXmlDocument.CreateXmlDeclaration("1.0", null, null);
            entitiesXmlDocument.AppendChild(dec);
            XmlDocumentType newDocType = entitiesXmlDocument.CreateDocumentType(parsedXmlDocument.DocumentType.Name, parsedXmlDocument.DocumentType.PublicId, parsedXmlDocument.DocumentType.SystemId, parsedXmlDocument.DocumentType.InternalSubset);
            entitiesXmlDocument.AppendChild(newDocType);
            XmlElement root = entitiesXmlDocument.CreateElement("xmlEntitiesDoc");
            entitiesXmlDocument.AppendChild(root);
            XmlNamedNodeMap entitiesMap = entitiesXmlDocument.DocumentType.Entities;
    
            // build a dictionary of entity names and values
            Dictionary<string, string> entitiesDictionary = new Dictionary<string, string>();
            for (int i = 0; i < entitiesMap.Count; i++)
            {
                XmlElement entityElement = entitiesXmlDocument.CreateElement(entitiesMap.Item(i).Name);
                XmlEntityReference entityRefElement = entitiesXmlDocument.CreateEntityReference(entitiesMap.Item(i).Name);
                entityElement.AppendChild(entityRefElement);
                root.AppendChild(entityElement);
                if (!string.IsNullOrEmpty(entityElement.ChildNodes[0].InnerXml))
                {
                    // do not add parameter entities or invalid entities
                    // this can be determined by checking for an empty string
                    entitiesDictionary.Add(entitiesMap.Item(i).Name, entityElement.ChildNodes[0].InnerXml);
                }
            }