Search code examples
ibm-cloudspeech-to-textchinese-localeibm-watson

Watson speech-to-text service how to return language other than English in java


When I try STT Java code with Model set to "zh-CN_BroadbandModel", not able to get expected result.

Here is my sample code:

public static void main (String[] args) {

    SpeechToText service = new SpeechToText();
    service.setUsernameAndPassword(USERNAME, PASSWORD);

    File file = new File("C:/IBM/Watson/APIs/speech-to-text/test.wav");

    Map<String, Object> params = new HashMap<String, Object>();
    params.put("audio", file);
    params.put("content_type","audio/wav");
    params.put("model", "zh-CN_BroadbandModel");

    SpeechResults transcript = service.recognize(params);

    System.out.println(transcript);
}

SpeechResult as below:

{"results": [{
      "final": true,
      "alternatives": [
        {"transcript": "?? ? ? ? ?? ? ? ? ?? ??? ? ??? ?? ? ? ?? ?? ? ??? ? ?? ? ?? ?? ? ?? ? ?? ? ?? ?? ? "}]}],
  "result_index": 0
}

I tried to change model to "en-US_BroadbandModel", even with same wav file, it return English words (although the wav is in Chinese). In this case, I think "model" setting does affects.

But in Response, I can see local is en_US.

Any way to set language?


Solution

  • This is an issue with the console output encoding used by Java in Windows.

    Default console output encoding on Windows is not UTF-8 but CP850.

    Use a PrintStream with an explicit encoding to view the the results.

    PrintStream out = new PrintStream(System.out, true, "UTF-8");