Search code examples
pythonstringbytetext-to-speechvlc

Creating a simple IBM Assistant using their TTS and STT. I get a Bytes and Strings error. I am using VLC to play audio. How can I fix this?


This is the code. Its purpose is to use VLC for IBM's Text to Speech to speak within the Python IDE. It's my first step for the assistant. This question is different from a regular strings and bytes error because it involves IBM Cloud instead of a simple program error.

import vlc
from ibm_watson import TextToSpeechV1
from ibm_cloud_sdk_core.authenticators import IAMAuthenticator

authenticator = IAMAuthenticator("API Key Here")
text_to_speech = TextToSpeechV1(
    authenticator=authenticator
)

text_to_speech.set_service_url(
    'https://api.us-south.text-to-speech.watson.cloud.ibm.com/instances/113cd664-f07b-44fe-a11d-a46cc50caf84')

# define VLC instance
instance = vlc.Instance('--input-repeat=-1', '--fullscreen')

# Define VLC player
player = instance.media_player_new()

# Define VLC media
media = instance.media_new(
    text_to_speech.synthesize(
        'Hello world',
        voice='en-US_AllisonVoice',
        accept='audio/wav').get_result().content)

# Set player media
player.set_media(media)

# Play the media
player.play()

I get this error...

   Traceback (most recent call last):
      File "C:/Users/PycharmProjects/IBM Test/iBM tEST.py", line 24, in <module>
        accept='audio/wav').get_result().content)
      File "C:\Users\PycharmProjects\IBM Test\venv\lib\site-packages\vlc.py", line 1947, in media_new
        if ':' in mrl and mrl.index(':') > 1:
    TypeError: a bytes-like object is required, not 'str'

I have tried this...

text_to_speech.synthesize('Hello world'.encode(), ...)

I get this error back...

b'Hello world' is not JSON serializable

If anyone recognizes this issue, please let me know what I could be doing wrong. I am trying to play a simple text line in my Python IDE. I am coding in PyCharm.

I know that this block of code works because it is directly from IBM's API documentation. I have used this for myself to test...

from ibm_watson import TextToSpeechV1
from ibm_cloud_sdk_core.authenticators import IAMAuthenticator

authenticator = IAMAuthenticator('{apikey}')
text_to_speech = TextToSpeechV1(
    authenticator=authenticator
)

text_to_speech.set_service_url('{url}')

with open('hello_world.wav', 'wb') as audio_file:
    audio_file.write(
        text_to_speech.synthesize(
            'Hello world',
            voice='en-US_AllisonVoice',
            accept='audio/wav'        
        ).get_result().content)

This code saves what is inputted as text into an mp3 file called Hello World. I am basically trying to integrate that into a system that plays the speech directly into the IDE. If anyone knows of any alternative methods other than VLC, please let me know.


Solution

  • If you pay close attention to the error message you will see that the error is actually being thrown by the vlc code. Which implies that the output from the TTS speech is not what vlc is expecting.

    You need to break up your code and first verify what output you are getting from TTS. If it is audio, then you can work out how the vlc code expects it. I suspect it not in the format that the TTS is outputting.

    Updated answer

    The output from TTS is a data stream of audio content, in Python this will be a byte array. It looks as though VLC is looking for a string. This makes no sense if VLC is looking for audio data. If however, it was looking for a string, then that string could be a file destination. So I think you need to write the file, and give the file destination to VLC.

    IMHO based on the question you are asking and the code you have cobbled together, your coding skills are not up to the challenge, and you maybe better off spending a couple of weeks going through some Python coding tutorials. You may find the investment in training time pays off without you struggling with what are quite fundamental coding issues here.