Search code examples
javaibm-cloudspeech-to-textibm-watson

IBM Watson Speech to Text using WebSockets


I am trying to use the Watson Developer Cloud java SDK to transcribe large audio files. I tried the Sessionless method and it works fine, however when I try the WebSockets method things become unreliable.

Most of the time the method will just return with no SpeechResult passed to the delegates; rarely it works, but it only transcribes the first couple of seconds.

This is what my code looks like:

static SpeechResults transcript = null;
private static String SpeechToText(String audioFile) throws FileNotFoundException {
        SpeechToText service = new SpeechToText();
        service.setUsernameAndPassword("<!!USERNAME!!>", "<!!PASSWORD!!>");
        service.setEndPoint("https://stream.watsonplatform.net/speech-to-text/api");

        RecognizeOptions options = new RecognizeOptions();
        options.contentType("audio/ogg;codecs=opus");
        options.continuous(Boolean.TRUE);
        options.inactivityTimeout(-1);
        options.model(Models.GetModelName(Models.SpeechModelEnums.ArabicBroadband));
        options.timestamps(Boolean.TRUE);
        options.wordAlternativesThreshold(0.5);
        options.wordConfidence(Boolean.TRUE);

        options.interimResults(Boolean.FALSE);

        File audio = new File(audioFile);

        //This is my sessionless call
        //SpeechResults transcript = service.recognize(audio, options);


        service.recognizeUsingWebSockets(new FileInputStream(audio),  options, new BaseRecognizeDelegate()
        {
                @Override
                public void onMessage(SpeechResults speechResults){
                System.out.println(speechResults);                
                }
            }
        );

        return "";//transcript.toString();
    } 

I have continuous enabled. I tried fiddling with interimResults but that did not work.

What am I doing wrong?


Solution

  • The issue you are mentioning was fixed in the 3.0.0-RC1 version.
    I've answered a similar question and added a code snippet that recognizes an audio file using WebSockets.

    Starting from the 3.0.0-RC1 there is a WebSocket example in the README.