Let me be more specific about my question with an example: Let's say that I have a slew of little servers that all start up on different ports using TCPv4. These ports are going to be destination ports, of course. Let's further assume that these little servers don't just start up at boot time like a typical server, but rather they churn dynamically based on demand. They start up when needed, and may shut themselves down for a while, and then later start up again.
Now let's say that on this same computer, we also have lots of client processes making requests to server processes on other computers via TCPv4. When a client makes such a request, it is assigned a source port by the OS.
Let's say for the sake of this example that a client processes makes a web request to a RESTful server running on a different computer. Let's also say that the source port assigned by the OS to this request is port 7777.
For this example let's also say that while the above request is still occurring, one of our little servers wants to start up, and it wants to start up on destination port 7777.
My question is will this cause a conflict? I.e., will the server get an error because port 7777 is already in use? Or will everything be fine because these two different kinds of ports live in different address spaces that cannot conflict with each other?
One reason I'm worried about the potential for conflict here is that I've seen web pages that say that "ephemeral source port selection" is typically done in a port number range that begins at a relatively high number. Here is such a web page:
https://www.cymru.com/jtk/misc/ephemeralports.html
A natural assumption for why source ports would begin at high numbers, rather than just starting at 1, is to avoid conflict with the destination ports used by server processes. Though I haven't yet seen anything that explicitly comes out and says that this is the case.
P.S. There is, of course, a potential distinction between what the TCPv4 protocol spec has to say on this issue, and what OSes actually do. E.g., perhaps the protocol is agnostic, but OSes tend to only use a single address space? Or perhaps different OSes treat the issue differently?
Personally, I'm most interested at the moment in what Linux would do.
The TCP specification says that connections are identified by the tuple:
{local addr, local port, remote addr, remote port}
Based on this, there theoretically shouldn't be a conflict between a local port used in an existing connection and trying to bind that same port for a server to listen on, since the listening socket doesn't have a remote address/port (these are represented as wildcards in the spec).
However, most TCP implementations, including the Unix sockets API, are more strict than this. If the local port is already in use in any existing socket, you won't be able to bind it, you'll get the error EADDRINUSE
. A special exception is made if the existing sockets are all in TIME_WAIT
state and the new socket has the SO_REUSEADDR
socket option; this is used to allow a server to restart while the sockets left over from a previous process are still waiting to time out.
For this reason, the port range is generally divided into ranges with different uses. When a socket doesn't bind a local port (either because it just called connect()
without calling bind()
, or by specifying IPPORT_ANY
as the port in bind()
), the port is chosen from the ephemeral range, which is usually very high numbered ports. Servers, on the other hand, are expected to bind to low-numbered ports. If network applications follow this convention, you should not run into conflicts.