Search code examples
pjsip

Should I call pjsua_call_make_call (pj::Call::makeCall) in a background thread?


I am developing a multi-account SIP softphone with custom media transport. I need to manage multiple calls simultaneously. I have read in "PJSUA2 Book" (3 PJSUA2-High Level API / General concepts / Asynchronous Operations):

(...) all operations that involve sending and receiving SIP messages are asynchronous, meaning that the function that invokes the operation will complete immediately (...) When this function [pj::Call::makeCall] returns successfully, it does not mean that the call has been established, but rather it means that the call has been initiated successfully. You will be given the report of the call progress and/or completion in the onCallState() callback method of Call class.

So I thought that I could call pj::Call::makeCall whenever/wherever I needed to initiate a call, the function would return nearly immediately and any progress would be reported in the callback methods. But today in my test environment I noticed a few seconds delay between the moment I called the function and it returned. Below there are detailed PJSIP logs that appear between makeCall is called and it returned. One can see almost 4s gap before "RTP socket reachable":

12:08:32.727           pjsua_call.c !Making call with acc #0 to sip:[email protected]:5060
12:08:32.727      dlg0x7f3a54002f08  .UAC dialog created
12:08:32.727      dlg0x7f3a54002f08  ..Session count inc to 2 by mod-pjsua
12:08:32.727          pjsua_media.c  .Call 0: initializing media..
12:08:36.069          pjsua_media.c  ..RTP socket reachable at 10.0.0.100:4000
12:08:36.069          pjsua_media.c  ..RTCP socket reachable at 10.0.0.100:4001
12:08:36.069     srtp0x7f3a540052d0  ..SRTP keying SDES created
12:08:36.069          pjsua_media.c  ..Media index 0 selected for audio call 0
12:08:36.069          pjsua_media.c  ..Call 0: media transport initialization complete: Success
12:08:36.069      dlg0x7f3a54002f08  ..Session count dec to 2 by mod-pjsua
12:08:36.069      dlg0x7f3a54002f08  .Module mod-invite added as dialog usage, data=0x7f3a5400d408
12:08:36.069      dlg0x7f3a54002f08  ..Session count inc to 4 by mod-invite
12:08:36.069      dlg0x7f3a54002f08  .Module mod-100rel added as dialog usage, data=0x7f3a5400f0a8
12:08:36.069      dlg0x7f3a54002f08  .100rel module attached
12:08:36.069      inv0x7f3a54002f08  .UAC invite session created for dialog dlg0x7f3a54002f08
12:08:36.069               endpoint  .Request msg INVITE/cseq=7778 (tdta0x7f3a5400f308) created.
12:08:36.069      inv0x7f3a54002f08  ..Sending Request msg INVITE/cseq=7778 (tdta0x7f3a5400f308)
12:08:36.069      dlg0x7f3a54002f08  ...Sending Request msg INVITE/cseq=7778 (tdta0x7f3a5400f308)
12:08:36.069      tsx0x7f3a54012278  ....Transaction created for Request msg INVITE/cseq=7777 (tdta0x7f3a5400f308)
12:08:36.069      tsx0x7f3a54012278  ...Sending Request msg INVITE/cseq=7777 (tdta0x7f3a5400f308) in state Null
12:08:36.069          sip_resolve.c  ....Target '10.0.0.21:5060' type=Unspecified resolved to '10.0.0.21:5060' type=UDP (UDP transport)
12:08:36.069      tsx0x7f3a54012278  ....State changed from Null to Calling, event=TX_MSG
12:08:36.070      dlg0x7f3a54002f08  .....Transaction tsx0x7f3a54012278 state changed to Calling
12:08:36.070            pjsua_acc.c !Sending 2 bytes keep-alive packet for acc 0 to 10.0.0.21:5060
12:08:36.070     tdta0x7f3a44005248  Destroying txdata raw
12:08:36.070            pjsua_acc.c  Sending 2 bytes keep-alive packet for acc 1 to 10.0.0.21:5060
12:08:36.070     tdta0x7f3a44005248  Destroying txdata raw

Is it normal? I mean, should I redesign the application and call makeCall in a background thread? Or is it a result of some misconfiguration? Or a bug (somewhere)?

When I compare the PJSIP logs with timestamped Wireshark output it seems to me that indeed no SIP packets are sent to PBX before makeCall returns. So these few seconds seem to be wasted on internal PJSIP work. Could it be an effect of some (temporary) deadlock?

The application runs on 64-bit Linux (Fedora 31 Workstation in this case).


Solution

  • After even more research I can reply to my own question. Maybe it could help someone...

    I discovered makeCall triggered two DNS queries for localhost which failed after nearly 2 second each. That would make nearly 4 second gap.

    The solution is to disable localhost IP address resolution completely by adding #define PJ_GETHOSTIP_DISABLE_LOCAL_RESOLUTION 1 into the pjlib/include/pj/config_site.h configuration file and recompile the PJSIP library.

    The issue turned out to be already known and has been resolved in case of iOS: https://trac.pjsip.org/repos/ticket/1342

    There was somebody else who experienced the same problem on Linux: https://www.spinics.net/lists/pjsip/msg20517.html