Search code examples
microsoft-teamstransient-failure

Unable to consistently join meetings in microsoft teams "Oh dear! your call has dropped."


I spent weeks researching the issue where i could open teams but i could not join meetings. i could do ad-hoc calls with other team members but the meetings would drop within 5-10 seconds with the error message "Oh dear! Your call has dropped. Please try again."


Solution

  • There was a bug in my router (and other co-workers router's from other vendors) that was double-mapping ports.

    the resolution was to setup a port forward using the following rules:

    Teams Audio: UDP 50000-50019

    Teams Video: UDP 50020-50039

    Teams Sharing: UDP 50040-50059

    here is the full details of the explanation of the problem outlined:

    • From the media log files for working and non working calls the connectivity checks does not go on to finish all the time. Initially in both working and non-working cases client does receive a response from MP(Media Processor) ip directly as shown below Working

    tc::icemachine::IceMachineImpl::ProcessReceive 13:41:20.835 18208 TL_INFO [19ABA2EF970]: ICEMCHN #0 ProcessReceive() PipeInfo{UDP, Local:10.10.10.0:50006, PalBasedPipe} PipeData{IceData, Encap: Turn, {IP:}, Peer:104.209.195.0:50232, {010100342112a442b3...}}

    Non working

    tc::icemachine::IceMachineImpl::ProcessReceive 13:40:59.885 18208 TL_INFO [19ABA832A20]: ICEMCHN #0 ProcessReceive() PipeInfo{UDP, Local:10.10.10.0:50002, PalBasedPipe} PipeData{IceData, Encap: None, {IP:}, Peer:104.209.192.0:51402, {000100542112a4423f...}}

    However, in working call host ip is not promoted and it works through the relay IP on 3480

    ICEMCHN Pairs Dump: Failed IceCandidatePair{ P:0x7efffdfefdfffbfc L:IceCandidate{F:1 Rtp:{HostUDP, {IP:10.10.10.0:50006}, base:10.10.10.0:50006, rel:10.10.10.0:50006, bw:0, p:0x7efffdfe, pipe:UDP}, Rtcp:{Mux}} R:IceCandidate{D, F:1 Rtp:{StunUDP, {IP:104.209.195.0:50232}, base:, rel:, bw:0, p:0x7efffdfe, pipe:None}, Rtcp:{Mux}} DL:,}

    ICEMCHN Pairs Dump: Succeeded IceCandidatePair{ P:0x0afff5fefdfffbfc L:IceCandidate{D, F:5 Rtp:{TurnUDP, {IP:52.114.188.0:3480, ID:{864350ac4fd269e1}}, base:10.10.10.0:50006, rel:108.168.97.0:50006, bw:0, p:0x0afff5fe, pipe:UDP}, Rtcp:{Mux}} R:IceCandidate{D, F:1 Rtp:{StunUDP, {IP:104.209.195.0:50232}, base:, rel:, bw:0, p:0x7efffdfe, pipe:None}, Rtcp:{Mux}} DL:52.114.188.0:3480,52.114.188.0:3480}

    In the Non working logs there is no response to host pair and it also fails to fail on fallback path TL_WARN [19ABA832A20]: ICEMCHN #0, Fallback path is not found

    Upon discussion with the Product Group, we found that 1. Client sends connectivity check packet to MP for the audio modality, source port is 50006, dst port is 53176 MP receives it, the mapped source port is 1026 2. Client sends connectivity check packet to MP for video modality, source port is 50032, dst port is 57270 MP receives it, the mapped source port is 1026 3. Client sends connectivity check packet to MP for the app sharing modality, source port is 50052, dst port is 59746 MP receives it, the mapped source port is 1026

    So the NAT is using the same source port for multiple different source and destination ports. As the call proceeds, the client sends out more connectivity check packets to the MP and doesn't get responses back for both audio and video; when the audio modality fails, this is why it gives up on the call. From the MP logs though, I can tell that the MP is in fact receiving those requests, and is responding to them.

    From the wireshark traces i could see a request sent from the client IP to the media processor IP as below 6052 37.667639 10.10.10.110 137.116.60.197 STUN 150 Binding Request user: nz22:7IQN Internet Protocol Version 4, Src: 10.10.10.110, Dst: 137.116.60.197 User Datagram Protocol, Src Port: 50006, Dst Port: 53176

    And the response being sent is forwarded to a different IP by the NAT as below

    6065 37.733923 137.116.60.197 10.10.10.110 STUN 114 Binding Success Response XOR-MAPPED-ADDRESS: 108.168.97.86:1026 Internet Protocol Version 4, Src: 137.116.60.197, Dst: 10.10.10.110 User Datagram Protocol, Src Port: 53176, Dst Port: 50058 XOR-MAPPED-ADDRESS: 108.168.97.86:1026

    From the above traffic it clearly shows that traffic been sent from the MP gets routed to NAT(108.168.97.86:1026) and modifies the source port. The NAT is not properly maintaining the mappings, and is forwarding all received packets on port 1026 to the same source port, probably 50052, the modality that worked. 50052 is probably receiving everything that arrives at 1026, and thats why i see packet dropped traces, as its receiving packets that should really go to 50032 or 50006.

    Based on our research and analysis it appears to be the NAT mappings. As seen in the wireshark traces, NAT is forwarding all received packets on port 1026 to the same source port, probably 50052 which is being used for application sharing and in the logs i could see the successful application sharing modality that worked. The problem is that the NAT is not keeping the paths separate. It ends up such that traffic from all 3 server ports ends up going to the same private port, 50052 in this case.

    NAT can choose to give those private ports separate public ports, or it can choose to give those private ports the same public port, like its doing here. either way is valid. the thing is that if you decide to reuse the same public port, then the only way you know which is the right private port to forward traffic to is by keeping track of which destination the traffic is coming from or being sent to, which the NAT is maybe not doing here.The first thing the teams client does from that source port (ie 50006) is to send allocation traffic to its relay. This is the first egress media packet that the NAT will see. Oftentimes NAT's will try to give this traffic the same public port as the private port and, in fact, this NAT does seem to do this - the port that the relay saw on the NAT was the same as the private port of 50006 (same with video and app sharing). 1026 didn't come into play until the NAT saw packets from that same source port bound for different destinations - in this case the MP. So the first packet from 50006, destined for the relay, was assigned public port 50006 by the NAT. However, a packet from the same private port destined for the MP (different ip and port than the relay) got the 1026 NAT source port. The server does actually try to send a packet directly to the IP of the computer, but that IP is a private IP and so that packet will never get near the machine. The server also tries to send a packet to the public ip address that the client sent in its offer. This usually occurs before the client is ready to receive the packet though, and so most NAT's drop those packets.