Looking into the BitTorrent protocol I see that, for a request message, although the length field is 4 bytes long, the maximum permissible value is (normally) 2^16. Why is this?
This small transfer size seems to add a lot of complexity (having to handle a queue of requests and build up the piece out of the 16KB blocks). One upside I can see is that it gives the application control over rate limiting (via choking and unchoking). Is that so important to justify the added complexity?
The bittorrent protocol is a stream of messages, including control messages. Larger chunk sizes would would increase the latency of control messages to the point where the information they would convey would be obsolete (timed out) by the time they're received.
On slow internet connections (say 128kbit upload on ADSL or two bonded ISDN channels) this already takes a whole second for a single 16KiB block assuming nothing else is using the connection - an assumption that does not pan out in reality.
Note that http/2 also uses an initial frame size of 16KiB to multiplex streams.
This small transfer size seems to add a lot of complexity (having to handle a queue of requests and build up the piece out of the 16KB blocks).
Those things are necessary anyway.
Sub-piece requests are necessary to be able to fetch chunks of large pieces from multiple sources at once if the individual sources are slow or it is a high priority piece.
Queues are needed to have requests in flight at all times. Naïve request-receive-request cycles would mean that there are idle times where a remote does not send data.