Search code examples
pythonwebsockettwilio

Twilio Media Streams - Twilio is not connecting to my websocket server


I'm currently developing an application using Twilio Media Streams to process audio data in real-time. However, I'm encountering an issue where the call disconnects immediately after making a POST request to the /voice endpoint.

Here are the details of my setup:

  1. I am using Flask to host my application and Flask-Sockets to handle WebSocket connections.
  2. My /voice endpoint returns a TwiML response with the verb to start a media stream. The WebSocket URL in the TwiML response is wss://my-ngrok-subdomain.ngrok.io/stream.
  3. My WebSocket server is running and accessible from the internet. I have confirmed this by testing the WebSocket connection independently of Twilio using a WebSocket client.
  4. I have enabled detailed logging in my application and WebSocket server, but I have not found any errors or issues that could explain why the call is disconnecting / why Twilio is not connecting to my websocket.

Despite these measures, the call still disconnects immediately after making the POST request to the /voice endpoint. I have checked the Twilio logs, and they end on making the request to /voice. There are no logs indicating that Twilio is attempting to connect to my WebSocket.

Here is the relevant part of my WebSocket server code:

def handle_audio(ws):
    logging.info('Handling audio')  # Added logging
    while not ws.closed:
        message = ws.receive()
        if message is None:
            logging.info("No message received...")
            continue

        # Messages are a JSON encoded string
        data = json.loads(message)

        # Using the event type you can determine what type of message you are receiving
        if data['event'] == "connected":
            logging.info("Connected Message received: {}".format(message))
        elif data['event'] == "start":
            logging.info("Start Message received: {}".format(message))
        elif data['event'] == "media":
            logging.info("Media message: {}".format(message))
            payload = data['media']['payload']
            logging.info("Payload is: {}".format(payload))
            audio_data = base64.b64decode(payload)
            logging.info("That's {} bytes".format(len(audio_data)))

            # Split the audio data on silence
            audio_chunks = whisper_handler.split_on_silence(audio_data)
            logging.info('Split audio data into %d chunks', len(audio_chunks))  # Added logging

            # Transcribe each audio chunk
            transcriptions = [whisper_handler.transcribe_audio(chunk) for chunk in audio_chunks]
            logging.info('Transcribed audio chunks: %s', transcriptions)  # Added logging
        elif data['event'] == "closed":
            logging.info("Closed Message received: {}".format(message))
            break

I would appreciate any assistance you could provide in resolving this issue. Please let me know if you need any additional information.


Solution

  • Apparently, I needed to add:

    response.say("hello! how can I help!", voice='women')
    response.pause(length=60)
    

    So that my code now looks like this:

    @app.route('/voice', methods=['GET', 'POST'])
    def voice():
        response = VoiceResponse()
    
        # Use the <Start> verb to start a Media Stream
        start = Start()
        start.stream(name='My Audio Stream', url='wss://subdomain.ngrok-free.app/stream')  # Added name
        
        response.append(start)  # Append the Start object to the VoiceResponse object
    
        response.say("hello! how can I help!", voice='women')
        response.pause(length=60)
    
        # Return a Response object with the Content-Type header set to application/xml
        return Response(str(response), mimetype='application/xml')