Search code examples
pythonwebsocketspeech-to-textibm-watson

Can someone help me find the bug in my IBM speech to text code/


I am using websockets to send request to IBM's speech to text api, and I am getting a constant pipe break error. The documentation for IBM speech to text api says it can take up 4mb in frames, but I can only give it seventy kb without it breaking. https://www.ibm.com/watson/developercloud/doc/speech-to-text/websockets.html#WSopen Also, if I give a file under 70kb (5 secs), it works at the cost of not return me anything.

    import websocket
    from requests import get
    import user_info
    import json
    import time
    import threading

    api_token = "https://stream.watsonplatform.net/authorization/api/v1/token"
    s2t_url = "wss://stream.watsonplatform.net/speech-to-text/api/v1/recognize"
    s2t_model = 'es-ES_BroadbandModel'
    mb_chunk = 1024*50
    # https://pypi.python.org/pypi/websocket-clien*
    # https://www.ibm.com/watson/developercloud/doc/speech-to-text/websockets.html


    # -------
    # on_open
    # -------
    def on_open(ws):
        """
        Called by the websocet after it is
        opened and sends metadataabout the sound file
        """
        print("--------------WebSocket is open--------------")
        message = {
            'action': 'start',
            'content-type': 'audio/wav'
        }
        #def send_binary(*args):
        ws.send(json.dumps(message))
        i = 0
        with open("Deepak2_hwv4122_uncompressed.wav", "rb") as wav:
            # while True:
            piece = wav.read(mb_chunk)
            ws.send(piece)
            print(i)
            i+=1
            if not piece:
                #break
                pass
            wav.close()
            # ws.close()
        #t = threading.Thread(target=send_binary)
        #t.start()


# ----------
# on_message
# ----------
def on_message(ws, message):
    print("------------------MESSAGE------------------")
    print(message)


# --------
# on_error
# --------
def on_error(ws, error):
    print(error)
    print("------------------ERROR------------------")
# --------
# on_close
# --------
def on_close(ws):
    print("------------Connection is Closed-----------")
    ws.close()

# ----------------
# get_token
# ----------------
def get_token():
    """
    REST request to get the watson voice service API token
    """
    url = api_token + "?url=" + user_info.AUTH['url']
    print("URL: " + url)
    res = get(url, auth=(user_info.AUTH['username'], user_info.AUTH['password']))
    print('Auth Token: ' + res.text)
    return res.text


# ----
# main
# ----
if __name__ == "__main__":
    global ws_url
    cur_token = get_token()
    ws_url = s2t_url + '?watson-token=' + cur_token + '&model=' + s2t_model
    print("ws_uri: " + ws_url)

    # Start WebSocket Connection
    websocket.enableTrace(True)
    ws = websocket.WebSocketApp(ws_url, on_message=on_message, on_error=on_error, on_close=on_close)
    ws.on_open = on_open
    ws.run_forever()

The error I am getting is [Errno 32] Broken pipe File "/home/dell/rahmi/env/lib/python3.5/site-packages/websocket/_app.py", line 268, in _callback callback(self, *args) File "watson-test.py", line 35, in on_open ws.send(piece) File "/home/dell/rahmi/env/lib/python3.5/site-packages/websocket/_app.py", line 117, in send if not self.sock or self.sock.send(data, opcode) == 0: File "/home/dell/rahmi/env/lib/python3.5/site-packages/websocket/_core.py", line 234, in send return self.send_frame(frame) File "/home/dell/rahmi/env/lib/python3.5/site-packages/websocket/_core.py", line 259, in send_frame l = self._send(data) File "/home/dell/rahmi/env/lib/python3.5/site-packages/websocket/_core.py", line 423, in _send return send(self.sock, data) File "/home/dell/rahmi/env/lib/python3.5/site-packages/websocket/_socket.py", line 116, in send return sock.send(data) File "/usr/lib/python3.5/ssl.py", line 861, in send return self._sslobj.write(data) File "/usr/lib/python3.5/ssl.py", line 586, in write return self._sslobj.write(data)


Solution

  • I took a quick look at your code and saw that there is a missing part, you are not signalling the end of the audio stream after pushing all the audio in the on_open method. You can signal the end of audio by sending an empty binary message or a text message with the string {'action': 'stop'}, as described here: https://www.ibm.com/watson/developercloud/doc/speech-to-text/websockets.html I believe that is why you do not get any result. Also please make sure you do not close the websocket until the server replies with the final result.

    Thank you for the answer Sayuri Mizuguchi, I actually wrote the code hosted in https://github.com/watson-developer-cloud/speech-to-text-websockets-python , which is a simple example of interacting with Watson STT via websockets. That project is being integrated into the Watson Python SDK here: https://github.com/watson-developer-cloud/python-sdk

    Regarding conversion to base64, you just need to make sure that the audio is sent as a binary message, websocket stacks usually come with the ability to send either a text message or a binary message.