How received packets loss is getting calculated in microsoft teams app

I just found there is a new feature call health in Microsoft teams application which shows statistics like round trip time, inter-arrival jitter, received packet loss, etc in a call.

I'm interested to know what this received packet loss specifically means here. Does this value correspond to the loss of packets from a single speaker? In a conference call, there can be multiple senders, so how they might be calculating this value?

My understanding is, suppose a receiver receives a series of packets from a single sender/ multiple senders, it can check which sequence numbers are missing in the stream corresponding to a particular sender/speaker and can calculate the packet loss value(but still there needs to be an aggregation function for this in case of multiple senders), But this also depends on whether teams app uses udp or tcp? High chances are udp but not sure. and if it uses udp, then packets can be out of order and calculating loss by the above idea doesn't sound reasonable.

Please give your insights on this.

Solution

The teams implementation of it's audio streams are not documented anywhere, so it's Microsoft implementation specific.

We can take a standard VOIP SIP endpoint implemenation as a most likely reference to how they work.

Audio is normally recieved as a stream of RTP packets over UDP. The RTP packet has a sequence number, so it's easy to figure out packet loss of your recieved RTP data. The RTP packet is normally feed into a jitter buffer implementation. So out of that you can easily calculate your recieved RTP packet metrics (jitter, packet loss, etc).

If the SIP call setup negotiated the setup of RTCP channel. This allows you to send jitter metrics you generated to the other side and you to recieve the jitter metrics from the other side of the call.

This is how a sip client endpoint can display the metrics you are talking about.

Since you are talking about a conference call, you are not talking directly to the other participants. What you have is a call between yourself and the conference server. All other participants in the conference have there own one to one calls to the conference server. The conference server can then "mix" the audio and send the mixed audio to each participant (or send each stream separately and let the client mix the audio, it's implementation specific).

So when you are talking about packet loss, you are talking about packet loss between you and the other end you are talking to. For a conference call, this is the conference server.