I am working on implementing RTP on an embedded MCU (STM32F4) and I'm having trouble with efficiently streaming audio data (8 kHz, u-law encoded).
For chunking audio data (20ms, 160 bytes) should I:
If (2), then should there be a RTP header for each 160 bytes of audio data within the single UDP datagram. For example, 5 RTP packets would be 800 bytes of audio data - would I send:
Using LinPhone as a client for testing, I am noticing multiple Out Of Time Packets and a short delay from when I speak into my embedded device to when I hear it on Linphone; and I'm trying to track down if more efficiently streaming data over UDP will fix it. I do not have the same delay when speaking into LinPhone and playing out of my embedded device, and the delay between the two is proving difficult for echo cancellation on the embedded MCU.
Given that RTP is for Real Time data and that each RTP payload is for a specific time it makes no sense to combine multiple RTP data (which are from different times) together into the same UDP packet. This means each RTP payload is prefixed by a RTP header which is then send immediately via UDP.