I have a routine that parses an XML response from an HTTP request and I use XmlDocument.LoadXml to help do this. I count on this method throwing an exception on bad XML and returning a loaded up XmlDocument object when successful.
What I didn't expect is for it to hang for several minutes loading a document. When I run this code in a test environment, it hangs for several minutes 100% of the time. Looks like some bug in .NET to me...
Dim tstring As String = ""
tstring &= "" & vbCrLf
tstring &= "" & vbCrLf
tstring &= "<!DOCTYPE html PUBLIC ""-//W3C//DTD XHTML 1.0 Transitional//EN"" ""http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"">" & vbCrLf
tstring &= "" & vbCrLf
tstring &= "<html> xmlns=""http://www.w3.org/1999/xhtml"" >" & vbCrLf
tstring &= "<head><title>" & vbCrLf
tstring &= " Error" & vbCrLf
tstring &= "</title></head>" & vbCrLf
tstring &= "<body>" & vbCrLf
tstring &= "</body>" & vbCrLf
tstring &= "</html>" & vbCrLf
Dim MyXmlDoc As New XmlDocument
MyXmlDoc.LoadXml(tstring)
The specific line in the document that can be removed to keep it from hanging is:
tstring &= "<!DOCTYPE html PUBLIC ""-//W3C//DTD XHTML 1.0 Transitional//EN"" ""http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"">" & vbCrLf
Am I going to have to search for "<!DOCTYPE html" in the string and not call LoadXml() if I see it? My concern about this is what other gotchas are waiting for me inside this method?
the loadxml call is parsing the doctype for validation purposes so it must fetch that url - that is slow in this case. You can test directly in your browser.
Another question provides a workaround - to quote:
in .NET 4.0 XmlTextReader has a property called DtdProcessing. When set to DtdProcessing.Ignore it should disable DTD processing.
and
doc.XmlResolver = null;
for .NET 3.5 should work.