Search code examples
c#.netasp.net-coreiistcp

ASP.Net Core + .NET Framework Web API - frequent network errors


I have an ASP.Net Core 2.2 Web API targeting .NET framework 4.7.2 running under IIS, Out of Process. I am seeing a lot of outgoing network issues. The app connects to various things (SQL databases, SSAS via XMLA etc.) and I am seeing across the board network failures to connect to various resources. At first I thought it was SQL connections only but it appears to be network in general. I get some sort of failure very regularly, around 1 in 10 requests. I have this in web.config:

    <system.net>
      <connectionManagement>
        <add address="*" maxconnection="65535"/>
      </connectionManagement>
    </system.net>

and this in Startup.cs:

ServicePointManager.DefaultConnectionLimit = int.MaxValue;

However, it's not just HTTP connections that are failing, it's mostly SQL, SSAS, general TCP.

The failures are things like SQL error 26, unable to connect and similar for other non-SQL network resources. It is intermittent, there is hardly any load on the box at all. It seems to happen when API calls are made back to back perhaps.

I don't think it's a general network (router/switch) issue as I can set up, for example, a scripted console SQL connect/select/teardown to one of the remote services showing connect failures from the app and this never fails, even in a repeating loop running at the same time as I see errors in the app connecting to the same DB. There must be some TCP/network tuning I am missing and would be grateful for any suggestions.


Solution

  • You have a very unusual issue. The reason to use large packets sizes is when you have a very small number of errors the more data you send the more errors you will get. But the error rate should be constant. You will always get errors. But in this case the error are not constant. The only start when data is split into more than one packet which occurs when you send data larger than 1500 bytes.

    You probably have a multi-hop connection where the client and server are not directly connected by one cable. You are going through routers and server which each have their own unique IP address (and name).

    Normally errors are fixed by replacing cables or pieces of hardware. You can either randomly start replacing hardware or you can use ping from/to different servers and routers to find which work and which gives error helping to isolate issue.

    In you case it looks like the issues is one of your devices was not designed correctly. Errors starting at 1500 bytes are usually due to vendor not fully testing devices and not fixing errors found during alpha and beta testing. Most vendors do a very good job of testing and certifying their devices. So this could be a "Made in China" issue. Or maybe even a virus. Still to fix issue start replacing or use ping to isolate which routes work and which give errors.