c#multithreading tcp communication scalability

High-Availability TCP server application

In my project I have a cloud hosted virtual machine running a C# application which needs to:

accept TCP connection from several external clients (approximately 500)
receive data asynchronously from the connected clients (not high frequency, approximately 1 message per minute)
do some processing on received data
forward received data to other actors
reply back to connected clients and possibly do some asynchronous sending (based on internal time-checks)

The design seems to me quite straightforward. I provide a listener which accepts incoming TCP connection, when a new connection is establhised a new thread is spawned; that thread runs in loop (performing activities points from 2 to 5) and check for associated socket aliveness (if socket is dead, the thread exits the loop and would eventually terminate; later a new connection will be attempted from the external client the socket belonged to).

So now the issue is that for limited amount of external clients (I would say 200/300) everything runs smoothly, but as that number grows (or when the clients send data with higher frequency) the communication gets very slow and obstructed.

I was thinking about some better design, for example:

using Tasks instead of Threads
using ThreadPool
replace 1Thread1Socket with something like 1Thread10Socket

or even some scaling strategies:

open two different TCP listeners (different port) within the same application (reconfiguring clients so that half of them target each listener)
provide two identical application with two different TCP listeners (different port) on the same virtual machine
set up two different virtual machines with the same application running on each of them (reconfiguring clients so that half of them target each virtual machine address)

Finally the questions: is the current design poor or naive? do you see any major criticality in the way I handle the communication? do you have any more robust and efficient option (among those mentioned above, or any additional one)?

Thanks

Solution

The number of listeners is unlikely to be a limiting factor. Here at Stack Overflow we handle ~60k sockets per instance, and the only reason we need multiple listeners is so we can split the traffic over multiple ports to avoid ephemeral port exhaustion at the load balancer. Likewise, I should note that those 60k per-instance socket servers run at basically zero CPU, so: it is premature to think about multiple exes, VMs, etc. That is not the problem. The problem is the code, and distributing a poor socket infrastructure over multiple processes just hides the problem.

Writing high performance socket servers is hard, but the good news is: you can avoid most of this. Kestrel (the ASP.NET Core http server) can act as a perfectly good TCP server, dealing with most of the horrible bits of async, sockets, buffer management, etc for you, so all you have to worry about is the actual data processing. The "pipelines" API even deals with back-buffers for you, so you don't need to worry about over-read.

An extensive walkthrough of this is in my 3-and-a-bit part blog series starting here - it is simply way too much information to try and post here. But it links through to a demo server - a dummy redis server hosted via Kestrel. It can also be hosted without Kestrel, using Pipelines.Sockets.Unofficial, but... frankly I'd use Kestrel. The server shown there is broadly similar (in terms of broad initialization - not the actual things it does) to our 60k-per-instance web-socket tier.