Google Cloud Text to Speech on Ruby

According to the documentation for the client libraries, there is a Ruby client library the google-cloud_text_to_speech gem. But that's all that specific client library gives documentation on--gem install.

The next step in the above documentation is to make the actual request but it is only available in Go, Java, Node.js and Python as an example.

I think I am able to translate this into Ruby by reading the Node.js for Ruby on Rails. But, I'm unclear where to set the environment variable. For Google Cloud Storage under Active Storage, I use a key file for a service account which is called this way:

credentials: <%= Rails.root.join("google-credentials.json") %>

This JSON file is of course in my .gitignore. But is set on Heroku as well.

Based on the Node.js, I am suspecting a translation along these lines--but do not see where I put the authentication information.

def synthesize_speech
  # Create the instance of client:
  client = TextToSpeech.new

  # Define the text that will be synthesized.
  text = self.content

  #Define the Request
  request = {
    input: {
      text: text
    },
    voice: {
      languageCode: "en-US",
      name: "en-US-Studio-M",
      ssmlGender: "MALE"
    },
    audioConfig: {
      audioEncoding: "MP3"
    }
  }.to_json

  # Get the response
  response = client.synthesizeSpeech(request)
  response = JSON.parse(response.body)

  # OPTION 1: Extract the base64 string and put it in the database:
  self.base64 = response["audioContent"]

  # OPTION 2: Extract the base64 and write it to a temp file
  base64 = File.open(tmpdir: "/tmp/base64.txt", "w") do |f|
    f.puts response["audioContent"]
  end
end

Once I get the above working I can worry about the conversion to .mp3 file. So what's needed here is to figure out how to authenticate the API.

Solution

If you use the official Ruby client from Google, you can find more information here

client = Google::Cloud::TextToSpeech.text_to_speech do |config|
  config.credentials = Rails.root.join("google-credentials.json")
end

response = client.synthesize_speech(
  input: { text: 'Lorem ipsum' },
  voice: { name: 'en-US-Studio-M', language_code: 'en-US' },
  audio_config: {audio_encoding: 'MP3'}
)

response.audio_content

Note that the audio_content is a binary.

So if you want to write it to an MP3 file, you need to use wb instead of just w when opening the file:

File.open("test.mp3", "wb") do |file|
  file.write response.audio_content
end