External code using select() with large numbers of file descriptors

I have a server program written in C++ that runs on Linux and implements a push-based messaging service using TCP. Because of the push approach, there might be a lot of simultaneous connections (I'm planning on about 1 million) that have to be kept open for longer amounts of time. Most of the time these connections are idle with the occasional heartbeat, and transfer usually happens in bursts on many of them at the same time.

To make things work, I'm using epoll for multiplexing the large amounts of sockets, and by modifying the RLIMIT_NOFILE, it does actually work quite well for large amounts of sockets.

My problem is that I'm also using other types of connections in the same program, most notably FastCGI (using libfcgi from the FastCGI SDK) for accepting HTTP requests, which use select() for their internal file descriptors. This leads to problems when those file descriptors become larger than 1024 (FD_SETSIZE), which is bound to happen if the epoll part of the program uses up most of the fd numbers below 1024.

I'm wondering what the best way to handle this would be.

Do I just have to modify all external code that uses select() and use poll() instead?

Is there maybe a way to force my epoll-based code (specifically the accept() call) to use only file descriptors above 1024, so that the ones below are reserved for select()-based code?

I understand that it might be possible somehow to increase the value of FD_SETSIZE, but I assume that would hurt performance a lot because of the way select() works, and it strikes me as a hack rather than a real solution.

Solution

I would suggest moving these things into their own process so that they get their own file descriptors. What I would suggest is that you have a protocol that multiplexes a large number of TCP connections over a single TCP connection. Your server would talk to the multiplexers rather than the individual clients. A multiplexer could run on the same machine or different machines, it could handle tens of thousands of client connections and would make only a single connection to the server.

One big advantage of this is that your server machine doesn't have to deal with the large number of TCP connection states. It won't have to deal with Internet junk like dropped packets, retransmissions, duplicate packets, rogue SYNs, slow links, and so on. It won't have to have send buffer for each client but will only have to talk to the fast, clean multiplexers who can do the buffering for clients.

If this is too much work or for some reason impossible, you can use dup2 to renumber the file descriptors into higher-numbered descriptors. I strongly suggest using multiplexers though -- the number of connections you're trying to handle is just too large to do from a single process.