Search code examples
streamblobbiztalkbiztalk-2013

BizTalk: Analyze binary blob hiding in XmlDocument?


I'm using BizTalk 2013 R1 to download a binary blob from a website via http. When I receive the blob, I'm just storing the message in an XmlDocument. However, sometimes that site returns the files I want, and sometimes it returns errors in the form of http pages containing error information.

I've attempted to screen for this by trying to run xpath on my return message. In particular, I'm looking for occurrences of "Error" in /html/head/title. My thinking is that if it find that text, or if it parses as Xml at all, I've gotten an error and I should throw an exception.

In practice though, I get this when I attempt to run that xpath:

System.Net.WebException: Unable to connect to the remote server ---> System.Net.Sockets.SocketException: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond 128.30.52.100:80
   at System.Net.Sockets.Socket.DoConnect(EndPoint endPointSnapshot, SocketAddress socketAddress)
   at System.Net.ServicePoint.ConnectSocketInternal(Boolean connectFailure, Socket s4, Socket s6, Socket& socket, IPAddress& address, ConnectSocketState state, IAsyncResult asyncResult, Exception& exception)
   --- End of inner exception stack trace ---
   at System.Net.HttpWebRequest.GetResponse()
   at System.Xml.XmlDownloadManager.GetNonFileStream(Uri uri, ICredentials credentials, IWebProxy proxy, RequestCachePolicy cachePolicy)
   at System.Xml.XmlUrlResolver.GetEntity(Uri absoluteUri, String role, Type ofObjectToReturn)
   at System.Xml.XmlTextReaderImpl.OpenAndPush(Uri uri)
   at System.Xml.XmlTextReaderImpl.PushExternalEntityOrSubset(String publicId, String systemId, Uri baseUri, String entityName)
   at System.Xml.XmlTextReaderImpl.DtdParserProxy_PushExternalSubset(String systemId, String publicId)
   at System.Xml.DtdParser.ParseExternalSubset()
   at System.Xml.DtdParser.Parse(Boolean saveInternalSubset)
   at System.Xml.DtdParser.System.Xml.IDtdParser.ParseInternalDtd(IDtdParserAdapter adapter, Boolean saveInternalSubset)
   at System.Xml.XmlTextReaderImpl.ParseDtd()
   at System.Xml.XmlTextReaderImpl.ParseDoctypeDecl()
   at System.Xml.XmlTextReaderImpl.ParseDocumentContent()
   at System.Xml.XmlLoader.LoadDocSequence(XmlDocument parentDoc)
   at System.Xml.XmlDocument.Load(XmlReader reader)
   at System.Xml.XmlDocument.Load(TextReader txtReader)
   at Microsoft.XLANGs.Core.Value.GetXmlDocument()
   at Microsoft.XLANGs.Core.Value.RetrieveAs(Type t)
   at Microsoft.XLANGs.Core.Part.get_XmlDocument()
   at Microsoft.XLANGs.Core.Part.XPathLoad(Part sourcePart, String xpath, Type dstType)
   at QTC.BizTalk.LSPDispatchIMNL.SendCommercialInvoice.segment3(StopConditions stopOn)
   at Microsoft.XLANGs.Core.SegmentScheduler.RunASegment(Segment s, StopConditions stopCond, Exception& exp)

Upon seeing this, it makes sense since I believe that BizTalk handles messages as streams in the background. Suddenly, the technique of hiding binary in XmlDocuments makes sense. So perhaps my test itself is causing a different problem.

I would like to be able to validate my response in some way, however. Is there anything I can do to analyze the response I get from the site, without causing the message to load? There's nothing all that useful in the context properties so I'm curious what I can do.


Solution

  • I'm not really sure how to make sense of your error (especially without seeing the code you're actually using to check the message), but either way I think you should do this in a custom pipeline component, for a few reasons.

    1. Loading the XmlDocument into the orchestration is going to be prohibitive if you're dealing with large binary objects.
    2. Trying to use XPath on binary data won't work
    3. Trying to use XPath on HTML won't always work

    You could very easily check the message size in a pipeline component (pInMsg.BodyPart.GetOriginalDataStream().Length for example). You could also try to read the first few bytes of the stream and check those for certain conditions more efficiently.