I am trying to use the Watson Developer Cloud java SDK to transcribe large audio files. I tried the Sessionless method and it works fine, however when I try the WebSockets method things become unreliable.
Most of the time the method will just return with no SpeechResult
passed to the delegates; rarely it works, but it only transcribes the first couple of seconds.
This is what my code looks like:
static SpeechResults transcript = null;
private static String SpeechToText(String audioFile) throws FileNotFoundException {
SpeechToText service = new SpeechToText();
service.setUsernameAndPassword("<!!USERNAME!!>", "<!!PASSWORD!!>");
service.setEndPoint("https://stream.watsonplatform.net/speech-to-text/api");
RecognizeOptions options = new RecognizeOptions();
options.contentType("audio/ogg;codecs=opus");
options.continuous(Boolean.TRUE);
options.inactivityTimeout(-1);
options.model(Models.GetModelName(Models.SpeechModelEnums.ArabicBroadband));
options.timestamps(Boolean.TRUE);
options.wordAlternativesThreshold(0.5);
options.wordConfidence(Boolean.TRUE);
options.interimResults(Boolean.FALSE);
File audio = new File(audioFile);
//This is my sessionless call
//SpeechResults transcript = service.recognize(audio, options);
service.recognizeUsingWebSockets(new FileInputStream(audio), options, new BaseRecognizeDelegate()
{
@Override
public void onMessage(SpeechResults speechResults){
System.out.println(speechResults);
}
}
);
return "";//transcript.toString();
}
I have continuous enabled. I tried fiddling with interimResults but that did not work.
What am I doing wrong?
The issue you are mentioning was fixed in the 3.0.0-RC1
version.
I've answered a similar question and added a code snippet that recognizes an audio file using WebSockets.
Starting from the 3.0.0-RC1
there is a WebSocket example in the README.