Search code examples
c#asp.netsocketstcpwindows-server-2012

Windows Server 2012 TCP Ports Increase until CPU Maximisation/Crash


I'm currently running an ASP.NET application on our Windows Server 2012 environment. Over the past few days our website and services have been hitting errors and dropping out.

When looking in the Resource Monitor after the services start to go down, we can see that the CPU is locked at 100% usage. We thought it might be a memory leak, so we increased the amount of memory available to our applications and watched the memory usage, and there was no change.

After using Process Monitor (ProcMon), at the very start of the service crash (I was lucky to be watching it at the time), the w3wp.exe service is hit with hundreds of TCP Send/TCP Receive requests to itself, from varying ports. For example, it would look something like this

TCP Send HOST-01234:65142 -> HOST-01234:49685
TCP Send HOST-01234:65143 -> HOST-01234:49685
TCP Send HOST-01234:65145 -> HOST-01234:49685
TCP Receive HOST-01234:65145 -> HOST-01234:49685
TCP Send HOST-01234:65146 -> HOST-01234:49685
TCP Send HOST-01234:65147 -> HOST-01234:49685
TCP Receive HOST-01234:65147 -> HOST-01234:49685
TCP Send HOST-01234:65149 -> HOST-01234:49685

As soon as it starts recording the several hundred of these events, the service goes down and we need to recycle our application pool manually. Now I know that it is obviously something in our code that is causing the issue, but I'm a bit stuck as to where. When we try to create mini-dumps, they are less than helpful.

I was wondering if it is the WebSocket code we have integrated into our website, on whether it's possibly opening ports and not closing them? Is that even possible?

Are there any other possible explanations for this behaviour?

EDIT 30/05/2014

After some more digging using Process Explorer and sifting through the few dozen threads of w3wp.exe, I've found the source of our ills.

It appears that an HTTP PUT request to another server is causing our service to hang, forcing the CPU to lock at 100% until it gets a response. Why it spikes to 100% I do not know, but at least I now have a source of the problem.


Solution

  • It turned out that one of our service operations was sending an HTTP request to a slow server every few seconds. The other server would sometimes not respond before another request was sent, and this would constantly build up until the CPU maxed out at 100% and killed our web server.

    I resolved it by implementing request timeouts, and managing our request sending by using a boolean value in our database to determine whether or not we need to send a request, rather than constantly sending it every few seconds.