Search code examples
python-2.7ibm-cloudspeech-to-textwatson-conversation

Working with json in Watson Python SDK


I am working on a project that will hopefully allow me to combine the Watson Python SDK implementation of speech-to-text...Watson Conversation...and text-to-speech. I am running into some problems though working with the Python 2.7 json data. I am actually trying to do two things:

1) I want to parse the json data just for the transcript values and it would be awesome if I could combine those values into an easily readable string format for use later in the program.

2) The other thing I need to do is manipulate the json in a way that would allow me to use it as input for the conversation or text-to-speech sections. Basically, how can I convert whats provided in the json into acceptable input for the other Watson modules?

What I've tried so far: I read the Python 2.7 json docs and tried to convert it back into a Python dictionary which sort of worked? All of the key:value pairs had a "u" before them and none of the regular dictionary methods seemed to work on them. Also, they don't look like the standard Key:Value combinations. I was able to put all of the json data in one variable though. I'll post my code below (ignore the print statements as I was just checking to see how the data looked at each step), but it's mostly just what you can get from the github examples section.

** Just a quick final question too: Is the Python SDK limited in any way compared to the other ones (Java, JScript, etc) because it seems like their output is much easier to work with?

import json
from os.path import join, dirname
from watson_developer_cloud import SpeechToTextV1


speech_to_text = SpeechToTextV1(
    username='###########',
    password='###########,
    x_watson_learning_opt_out=True
)

with open(join(dirname(__file__), '/home/user/Desktop/output.wav'),'rb') as audio_file:

    json_str = (json.dumps(speech_to_text.recognize(audio_file, content_type='audio/wav', timestamps=False, word_confidence=False,
model='en-US_NarrowbandModel'), indent=2))

print json_str

json_dict = json.loads(json_str)

print json_dict



def main(args):
    return 0

if __name__ == '__main__':
    import sys
    sys.exit(main(sys.argv))

Solution

  • The issue appears to me that you are dumping your JSON to a string, then trying to access it as an object.

    Using the following sample code, it works.

    from os.path import join, dirname
    from watson_developer_cloud import SpeechToTextV1
    
    
    speech_to_text = SpeechToTextV1(
        username='....',
        password='....',
        x_watson_learning_opt_out=True
    )
    
    with open('../blog/ihaveadream.wav','rb') as audio_file:
        response = speech_to_text.recognize(audio_file, content_type='audio/wav', timestamps=False, word_confidence=False, model='en-US_NarrowbandModel')
    
    print json.dumps(response, indent=2)
    

    This returns the following:

    {
      "results": [
        {
          "alternatives": [
            {
              "confidence": 1.0, 
              "transcript": "I still have a dream "
            }
          ], 
          "final": true
        }, 
        {
          "alternatives": [
            {
              "confidence": 0.999, 
              "transcript": "it is a dream deeply rooted in the American dream I have a dream "
            }
          ], 
          "final": true
        }, 
        {
          "alternatives": [
            {
              "confidence": 1.0, 
              "transcript": "that one day this nation will rise up and live out the true meaning of its creed we hold these truths to be self evident that all men are created equal "
            }
          ], 
          "final": true
        }
      ], 
      "result_index": 0, 
      "warnings": [
        "Unknown arguments: continuous."
      ]
    }
    

    So if you wanted to access the top level response you can do the following.

    print 'Confidence: {}'.format(response['results'][0]['alternatives'][0]['confidence'])
    print 'Transcript: {}'.format(response['results'][0]['alternatives'][0]['transcript'])
    

    The output of that would be:

    Confidence: 1.0
    Transcript: I still have a dream