I'm currently implementing reliable UDP transport inspired by KCP, Dragonite, and QUIC just in self-education purpose. I want to apply several optimizations, one of which is multiplexing.
My idea is: I split data into small chunks (chunk size is correlating with MTU) and send and receive them through multiple datagram sockets asynchronously in parallel (both on client and server) utilizing coroutines.
Will this solution work? Should I expect performance improvement?
Contrary to TCP UDP has no slow start, i.e. it can start sending with full speed (if known) from the beginning. Thus essentially the limits of how fast sending can be done is either the speed in which the local system can send data or the available bandwidth. Assuming that the sending is not CPU bound and the traffic of all of the multiple sockets you envision will take the same way (outgoing network card, routers, incoming network card) and no connection-specific traffic shaping is done in middleboxes, then using multiple sockets should not result in increased speed since it does not change how the various bottlenecks are used.
This changes if the sending is CPU bound. In this case the use of multiple coroutines combined with multiple sockets might make better use of today's multi-processor systems in that it is running on multiple CPU cores at the same time and this way can send more packets until it gets CPU bound again.
This changes also if the traffic is bandwidth-bound but there are alternative path to the target system which provide additional bandwidth. By binding the sockets to a different local IP address (on a different local network card) or by choosing a different target IP address (for the same target system) one might be able to use such alternative path and thus make use of the additional bandwidth.
Similarly multiple sockets might help if there is some traffic shaping which limits the bandwidth per connection in between client and server. In this case multiple sockets can increase the amount of usable bandwidth.