What causes a SOAP service to keep disconnecting TLS clients after responding to a single message?

I loaded a client-side .svclog file inside Microsoft Service Trace Viewer and there are a lot of entries in the log saying setting up secure session and close secure session. On the server side, I can see many instances of trust/RST/SCT/Cancel, indicating that the connections are being closed on the server side, but only after giving a response to a SOAP message. It seems like every web service call involves setting up a TLS session for SOAP, and then the connection being closed immediately after sending a response, requiring that TLS be set up again for the very next call.

I read this article: https://blogs.technet.microsoft.com/tspring/2015/02/23/poor-mans-guide-to-troubleshooting-tls-failures/

It said:

Keep in mind that TCP resets should always be expected at some point as the client closes out the session to the server. However, if there are a high volume of TCP resets with little or no “Application Data” (traffic which contains the encapsulated encrypted data between client and server) then you likely have a problem. Particularly if the server side is resetting the connection as opposed to the client.

Unfortunately, the article doesn't expand on this, because it is exactly what I am seeing!

This is a net.tcp web service installed in some customer environment, set up to use Windows authentication.

What's the next step in my diagnosis?

Solution

Most likely the behavior you are seeing is normal, and unless you are experiencing some problems I would not be concerned. The MSFT document you quote is referring to TCP resets, but you said your logs show trust/RST/SCT/Cancel entries, and in that context RST means RequestSecurityToken. In other words, your log messages don't in any way imply that there are TCP reset (RST) frames occurring.

The Web Services Secure Conversation Language (WS-SecureConversation) spec (here) says:

It is not uncommon for a requestor to be done with a security context token before it expires. In such cases the requestor can explicitly cancel the security context using this specialized binding based on the WS-Trust Cancel binding. The following Action URIs are used with this binding: http://schemas.xmlsoap.org/ws/2005/02/trust/RST/SCT/Cancel http://schemas.xmlsoap.org/ws/2005/02/trust/RSTR/SCT/Cancel

Once a security context has been cancelled it MUST NOT be allowed for authentication or authorization or allow renewal. Proof of possession of the key associated with the security context MUST be proven in order for the context to be cancelled.

If you actually are experiencing transport problems due to unexpected TCP RST frames, or if you are seeing them and are curious to understand their underlying cause, then you'll need to capture network traffic to see how and why TCP resets are occurring, and whether they are normal or abnormal.

I'd do that by firing up WireShark and looking at the frames. If you see FIN, ACK messages from each side then you expect the connection to be closed gracefully after a waiting period. Otherwise you'll see RST frames for a variety of reasons: application resets (performed to avoid tying up a lot of ports in Wait states), bad sequence number when re-accessing a port that's in a Wait state, router or firewall RST messages (typically sent both directions), retransmit timeouts, port choice RST messages, and others.

There are lots of resources to help with TCP traffic analysis. You might find it helpful to take a look at https://blogs.technet.microsoft.com/networking/2009/08/12/where-do-resets-come-from-no-the-stork-does-not-bring-them/ for a quick overview.

If you're not familiar with WireShark it can seem a little complicated, but the thing you want to do here is very simple and you can get your answer very quickly even with no prior experience. Just search for wireshark tutorials and you'll find one that fits your cognitive style.

You can also use WireShark to troubleshoot higher level protocols, including TLS. You can find information about that in many places. I'll just list a few to get you started:

WireShark documentation on SSL is here.

Wikiversity section on HTTPS is here.

A 5-minute youtube tutorial for looking at SSL traffic is here.

I believe this covers your next diagnostic step reasonably well, but if not, feel free to post more information and I can try to provide a better answer.