Search code examples
xmltwiliovoicexmlvonagetropo

What are the differences between VoiceXML and TwiML / PlivoXML?


I have been tasked to research the differences between these two implementations to better understand the differences between the two when it comes to development difficulty and feature set but I have not found any clear and concise comparisons between the two.


Solution

  • I think you're asking for the difference between things like VoiceXML, TwiML, and PlivoXML. Both Tropo and Nexmo support VoiceXML, so this is a comparison of the XML formats (and associated platforms) not the specific vendors. I added PlivoXML as it's similar to TiwML, but unique. Disclaimer: I work for Nexmo.

    All three describe what happens during a phone call - how the machine interacts with the caller. Essentially HTML for a phone call, allowing you to present information to the user (play audio, read text), or get information from the user (record audio, recognize speech, press digits).

    Portability

    VoiceXML is a industry standard, and like HTML it's managed by the W3C. Both TwiML and PlivoXML are proprietary. That means a VoiceXML application isn't tied to a specific vendor.

    Input

    All three support recording audio or capturing DTMF (keypresses). VoiceXML supports grammar, allowing you to recognize speech, and tune that recognition engine. TwiML and PlivoXML do not have that support.

    TwiML example (expecting DTMF):

    <Response>
        <Gather action="process.php">
            <Say>Press a few digits.</Say>
        </Gather>
    </Response>
    

    VoiceXML example (expecting DTMF or recognition):

    <vxml version = "2.1">
        <form>
            <field name="department">
                <prompt>Press 1 or say sales, press 2 or say support.</prompt>
                <grammar xml:lang="en-US" root = "TOPLEVEL" mode="voice" >
                    <rule id="TOPLEVEL" scope="public">
                        <one-of>
                            <item> sales </item>
                            <item> support </item>
                        </one-of>
                    </rule>
                </grammar>
                <grammar xml:lang="en-US" root = "TOPLEVEL" mode="dtmf" >
                    <rule id="TOPLEVEL" scope="public">
                        <one-of>
                            <item> 1 <tag> out.department="sales"; </tag> </item>
                            <item> 2 <tag> out.department="support"; </tag> </item>
                        </one-of>
                    </rule>
                </grammar>
            </field>
            <block>
                <submit next="../php/form.php" method="post"/>
            </block>
        </form>
    </vxml>
    

    Output

    All three support both text to speech and playing audio (referenced by a link). Plivo also allows you to play audio to an ongoing call using the API, but that's outside the context of PlivoXML.

    TwiML example:

    <Response>
        <Say>Hello From TwiML</Say>
    </Response>
    

    VoiceXML example:

    <vxml version="2.1">
        <form>
            <block>
                <prompt>Hello from VXML!</prompt>
            </block>
        </form>
    </vxml>
    

    Variables & State

    TwiML and PlivoXML allow you to track some session just like a browser would; however, VoiceXML has a much more useful concept of state allowing you to share variables across multiple requests.

    A TwiML or PlivoXML document can only really collect one thing at a time Getting digits or a recording from a user is really analogous to a form post with a single element.

    VoiceXML forms are not limited to a single input, and contain multiple fields of recognized speech, DTMF presses, recordings. VoiceXML also allows that data to be played / read back to the user in the same document, as it's simply a variable. In fact, a single VoiceXML document can have multiple forms, and a user can navigate between those forms.

    VoiceXML example:

    <form id="welcome">
        <field name="customer_type">
            <prompt>Say 'new' or press 1 if you're a new  customer, press 2 or say 'existing' if you have an account.</prompt>
            <grammar xml:lang="en-US" root = "TOPLEVEL" mode="voice" >
                ...
            </grammar>
            <grammar xml:lang="en-US" root = "TOPLEVEL" mode="dtmf" >
                ...
            </grammar>
        </field>
        <filled>
            <prompt cond="customer_type=='new'">
                Thanks for contacting us.
            </prompt>
            <prompt cond="customer_type=='existing'">
                Thanks for being a loyal customer.
            </prompt>
            <goto expr="'#' + customer_type" />
        </filled>
    </form>
    

    Conferencing & Queues

    TwiML and PlivoXML support adding a call to a conference in the XML document. TwiML also supports the concept of a queue (and adding a call to it) right from TwiML (PlivoXML does not have that queue support). VoiceXML has no notion of conferencing or queueing in the VXML document (however, an API may provide an external mechanism to conference multiple active calls together).

    _TwiML example:

    <Response>
        <Dial>
            <Conference>Room 1234</Conference>
        </Dial>
    </Response>
    

    Transfers

    All three support adding a second leg to an ongoing call. VoiceXML allows you to use the output of the transfer to direct the rest of the document.

    TwiML example:

    <Response>
        <Dial timeout="10" record="true">415-123-4567</Dial>
    </Response>
    

    VoiceXML Example:

    <vxml version = "2.1">
        <form>
            <transfer name="result" dest="tel:+14158058810" bridge="true">
                <prompt>Please wait while we transfer you.</prompt>
                <grammar xml:lang="en-US" root = "TOPLEVEL" mode="voice">
                    <rule id="TOPLEVEL" scope="public">
                        <one-of>
                            <item> disconnect </item>
                        </one-of>
                    </rule>
                </grammar>
            </transfer>
            <filled>
                <if cond="result == 'busy'">
                    <prompt>Sorry, they're busy.</prompt>
                <elseif cond="result == 'noanswer'" />
                    <prompt>Sorry, they didn't answer.</prompt>
                <else />
                    <prompt>You spoke for <value expr="result$.duration" /> seconds.</prompt>
                </if>
    
                <if cond="result$.inputmode == 'voice'">
                    You ended the call by saying, <value expr="result$.utterance" />.
                </if>
            </filled>
            <block>
                Thanks for using the transfer element.
            </block>
        </form>
    </vxml>
    

    Extensibility: All three allow the call follow some concept of links to another VoiceXML / TwiML / PlivoXML document. However, VoiceXML has the concept of subdialogues, where control is transferred to another VoiceXML application, and the return value is passed back to the calling application. This can allow integration with (or development of) generic external services.

    VoiceXML example:

    <form id="billing_adjustment">
        <var name="account_number"/>
        <var name="home_phone"/>
        <subdialog name="accountinfo" src="acct_info.vxml#basic">
            <filled>
                <!-- Note the variable defined by "accountinfo" is
                returned as an ECMAScript object and it contains two
                properties defined by the variables specified in the
                "return" element of the subdialog. -->
    
                <assign name="account_number" expr="accountinfo.acctnum"/>
                <assign name="home_phone" expr="accountinfo.acctphone"/>
            </filled>
        </subdialog>
        ....
    </form>
    

    Examples based on / copied from Twilio's Docs, Nexmo's VXML Quickstarts, and the W3C's VXML Documentation.