I'm look recommendations on how to achieve low latency for the following network protocol:
Given that steps 1 and 2 do not need to be reliable (as long as a percentage of responses arrive back we proceed to step 3) and that 1 is essentially a multicast, this part of the protocol seems to suit UDP - setting up a TCP connection to these peers would add an addition round trip.
However step 4 needs to be reliable - we can't tolerate packet loss during the subsequent requests/responses.
The conundrum I'm facing is that UDP suits 1 and 2 and TCP protocol suits 4. Connecting to every peer selected in 1 is slow especially since we aim to transmit just 20kb, however UDP cannot be tolerated for step 4. Handshaking the peer selected in 4. would require an additional round trip, which compared to the 3 round trips still is a considerable increase in total time.
Is there some hybrid scheme whereby you can do a TCP handshake while transmitting a small amount of data? (The handshake could be merged into 1 and 2 and hence doesn't add any additional round trip time.)
Is there a name for such protocols? What should I read to become more acquainted with such problems?
Additional info:
There's not enough detail here to do a well-informed criticism. If you were hiring me for advice, I'd want to know a lot more about the proposal, but since I'm doing this for free, I'll just answer the question as asked, and try to make it practical rather than ideal.
I'd argue that UDP is not suitable for the early part of your protocol. You can't just multicast a single packet to a large number of hosts on the Internet (although you can do it on typical LANs). A 20KB payload is not the sort of thing you can generally transmit in a single datagram in any case, and the moment messages fail to fit in a single datagram, UDP loses most of its attraction, because you start reinventing TCP (badly).
Probably the simplest thing you can do is base your system on HTTP, and work with implementations which incorporate all the various speed-ups that Google (mostly) has been putting into HTTP development. This includes TCP Fast Open, and things like it. Initiate connections out to your chosen servers; some will respond faster than others: use that to your advantage by going with the quickest ones. Don't underestimate the importance of efficient implementation relative to theoretical round-trip time, by the way.
For stage two, continue with HTTP as before. For efficiency, you could hold all the connections open at the end of phase one and then close all the ones except your chosen phase two partner. It's not clear from your description that the phase two exchange lends itself to the HTTP model, though, so I have to hand-wave this a bit.
It's also possible that you can simply hold TCP connections open to all available peers more or less permanently, thus dodging the cost of connection establishment nearly all the time. A thousand simultaneous open connections is large, but not outrageous in most contexts (although you may need to tweak OS settings to allow it). If you do that, you can just talk whatever protocol you like over TCP. If it's a truly peer-to-peer protocol, you only need one TCP connection per pair. Implementing this kind of thing is tricky, though: an average programmer will do a terrible job of it, in my experience.