Search code examples
javascriptnetwork-programmingwebrtclag

What might cause this >1000ms lag in webrtc data channel messages?


When I setup a data channel between 2 browsers (testing on 2 different machines on the same network), I get different results regarding lag in the following 2 cases.

Case 1: sending / receiving only

When I setup one side to be sending test messages, with an interval of for example 70ms, I see them coming in on the other side without noticeable lag. The time between each received message is close to 70ms. So far so good.

Case 2: Both sides sending and receiving in turn

When I setup both sides to send a message as soon as it received a message from the other side AND it is more than 70ms ago since last sending, everything goes fine, except for sometimes. Every few seconds (not consistent) I measure a delay of ~1000ms. The weird thing is, the time between the vast majority of messages is either < 200ms OR > ~1000ms.


I tested both cases in (combinations of) chrome and firefox, the behavior was similar. I also tested it on a mobile phone network (using tethering), which showed the same lag, although less often. The data channel was not configured with any special options, so it uses a reliable, ordered connection.

What could be causing this? It seems to me that it can be fixed, since sending in one direction (either way) works fine without lag. I also tried using a separate data channel for sending/receiving, which didn't matter.


Examples

Here is an example of test results for the second case. It's a list of all the round trip times that were higher than 200ms for 1000 round trips.

(Delay index) round trip time - round trip number - time
(0) 2192 - 0 - "2016-05-06T12:34:18.193Z"
(1) 1059 - 111 - "2016-05-06T12:34:22.777Z"
(2) 1165 - 372 - "2016-05-06T12:34:32.485Z"
(3) 1062 - 434 - "2016-05-06T12:34:35.585Z"
(4) 1157 - 463 - "2016-05-06T12:34:37.598Z"
(5) 1059 - 605 - "2016-05-06T12:34:43.264Z"
(6) 1160 - 612 - "2016-05-06T12:34:44.633Z"
(7) 1093 - 617 - "2016-05-06T12:34:45.857Z"
(8) 1158 - 624 - "2016-05-06T12:34:47.204Z"
(9) 1162 - 688 - "2016-05-06T12:34:50.401Z"
(10) 1158 - 733 - "2016-05-06T12:34:52.962Z"
(11) 1161 - 798 - "2016-05-06T12:34:56.163Z"
(12) 1157 - 822 - "2016-05-06T12:34:58.077Z"
(13) 1158 - 888 - "2016-05-06T12:35:01.281Z"
(14) 1160 - 893 - "2016-05-06T12:35:02.563Z"
(15) 1085 - 898 - "2016-05-06T12:35:03.768Z" 

Here is another example, including a 'PacketsSentPerSecond' graph from chrome://webrtc-internals:

PacketsSentPerSecond graph

In this test, ~2100 packets were sent, resulting in the following 26 round trips that took more than 900ms: [1762.6050000000014, 1179.7200000000012, 1765.375, 1149.945000000007, 1180.1399999999994, 1180.9550000000017, 1246.2450000000026, 1750.2649999999994, 1388.0149999999994, 1100.7499999999854, 4130.475000000006, 1160.1150000000052, 1082.4399999999878, 1055.2300000000105, 1498.715000000011, 1105.8850000000093, 1478.1600000000035, 2948.649999999994, 1538.2549999999756, 1839.9099999999744, 1768.6449999999895, 1167.929999999993, 1139.1750000000175, 1173.8850000000093, 1245.6600000000035, 1075.375]

I still didn't figure out what is causing this lag. I would expect a much smoother graph.


Solution

  • Although I'm still unsure what is causing the problem, I have found a solution. My best guess is that the problem is caused by flow control when one of the peers is not sending data for a while (or they just don't reach the other).

    I noticed there are no problems when both peers are sending packets to each other a 70ms interval, when they don't wait for a packet from each other. As soon as I delay sending a packet while waiting for an incoming packet, I get the >1000ms lags.

    So what I do now is actually sending packets at a steady rate EVEN if they are empty. My application requires sending data in turn, but I just check at an interval if there is anything to send, and if not, I still send an empty packet. This way, the problem seems solved in the tests I did so far!