Search code examples
jsonspring-mvcspeech-to-textibm-watson

Watson SpeechToText Java and javascript model differences


I'm working on integrating the watson-speech.js javascript library with a Spring-based server using the Watson Java SDK. I'm trying to send the output from a WatsonSpeech.SpeechToText.recognizeMicrophone call to the server with no luck. The Speech java classes appear to have the appropriate @SerializedName annotations that match the json being sent from the client, but I'm getting UnrecognizedPropertyException errors from Jackson.

Unrecognized field "keywords_result" (class com.ibm.watson.developer_cloud.speech_to_text.v1.model.SpeechResults), not marked as ignorable (2 known properties: "resultIndex", "results"])

Here's the controller method:

    @RequestMapping(value = "/postWatsonRequest", method = RequestMethod.POST)
    @ResponseBody
    @ResponseStatus(value=HttpStatus.OK)
    public ResponseObject postWatsonRequest(@RequestBody SpeechResults speechResults) {
    ...
    }

I'm clearly missing something. Do I need to unpack the json manually on the server side (custom deserializer?) or format it into an acceptable json string on the client side?


Solution

  • It turned out to be a couple of mistakes on my part and although I'm not sure this is the best solution it does work. Here's the full code for anyone that's interested. Key things that made it work:

    • You must use the receive-jason event to capture the full json result. The data event appears to only return the final text
    • The result data had to be wrapped in a valid json wrapper - data:{message:data} (this was my big mistake)
    • Do not include contentType: 'application/json; charset=utf-8', in the ajax call or the controller will not recognize the json data
    • The Watson Java SDK WebSocketManager receives an okhttp3.ResponseBody from Watson from which it extracts a string. I presume this is similar to what the javascript SDK receives so I used the same code from the WebSocketManager to convert the JSON.stringify string to a SpeechResults object in the controller.

    From the okhttp3.ResponseBody javadoc:

    A one-shot stream from the origin server to the client application with the raw bytes of the response body

    Watson javascript

    function listen(token) {
        stream = WatsonSpeech.SpeechToText.recognizeMicrophone({
            token: token,
            readableObjectMode: true,
            objectMode: true,
            word_confidence: true,
            format: false,
            keywords: keywordsArray,
            keywords_threshold : 0.5,
            continuous : false
            //interim_results : false
            //keepMicrophone: navigator.userAgent.indexOf('Firefox') > 0
        });
    
        stream.setEncoding('utf8');
    
        stream.on('error', function(err) {
            console.log(err);
            stream.stop();
        });
    
        stream.on('receive-json', function(msg) {
            console.log(msg);
            if (msg.state != 'listening') {
                if (msg.results[0].final) {
                    console.log('receive-json: ' + msg);
                    postResults(msg);               
                    stream.stop();
                }
            }
        });
    }
    

    Ajax post

    function postResults(results) {
        var data = JSON.stringify(results);
        console.log('stringify: ' + data);
        $.ajax({
            type: 'POST',
            url: appContextPath + '/postWatsonResult',
            dataType: 'json',
            data: {message:data}
        })
        .done(function(data) {
            console.log('done data: '+ data);
        })
        .fail(function(jqXHR, status, error) {
            var data = jqXHR.responseJSON;
            console.log('fail data: '+ data);
        });
    }
    

    Spring controller

    @RequestMapping(value = "/postWatsonResult", method = RequestMethod.POST)
    @ResponseBody
    @ResponseStatus(value=HttpStatus.OK)
    public ResponseObject postWatsonResult(@RequestParam("message") String message, Locale locale) {
        logger.info("postWatsonRequest");
        JsonObject json = new JsonParser().parse(message).getAsJsonObject();
        SpeechResults results = null;
        if (json.has("results")) {
            results = GSON.fromJson(message, SpeechResults.class);
        }
        if (results != null) {
            logger.debug("results: " + results.getResults().get(0).getAlternatives().get(0).getTranscript());
        }
    
        return new ResponseObject();
    }
    

    I still think it should be possible somehow to use @RequestBody SpeechResults speechResults so I'll continue to play around with this, but at least I have a working solution.