Search code examples
ruby-on-railsgcloud

Google Cloud Text to Speech on Ruby


According to the documentation for the client libraries, there is a Ruby client library the google-cloud_text_to_speech gem. But that's all that specific client library gives documentation on--gem install.

The next step in the above documentation is to make the actual request but it is only available in Go, Java, Node.js and Python as an example.

I think I am able to translate this into Ruby by reading the Node.js for Ruby on Rails. But, I'm unclear where to set the environment variable. For Google Cloud Storage under Active Storage, I use a key file for a service account which is called this way:

credentials: <%= Rails.root.join("google-credentials.json") %>

This JSON file is of course in my .gitignore. But is set on Heroku as well.

Based on the Node.js, I am suspecting a translation along these lines--but do not see where I put the authentication information.

def synthesize_speech
  # Create the instance of client:
  client = TextToSpeech.new

  # Define the text that will be synthesized.
  text = self.content

  #Define the Request
  request = {
    input: {
      text: text
    },
    voice: {
      languageCode: "en-US",
      name: "en-US-Studio-M",
      ssmlGender: "MALE"
    },
    audioConfig: {
      audioEncoding: "MP3"
    }
  }.to_json

  # Get the response
  response = client.synthesizeSpeech(request)
  response = JSON.parse(response.body)

  # OPTION 1: Extract the base64 string and put it in the database:
  self.base64 = response["audioContent"]

  # OPTION 2: Extract the base64 and write it to a temp file
  base64 = File.open(tmpdir: "/tmp/base64.txt", "w") do |f|
    f.puts response["audioContent"]
  end
end

Once I get the above working I can worry about the conversion to .mp3 file. So what's needed here is to figure out how to authenticate the API.


Solution

  • If you use the official Ruby client from Google, you can find more information here

    client = Google::Cloud::TextToSpeech.text_to_speech do |config|
      config.credentials = Rails.root.join("google-credentials.json")
    end
    
    response = client.synthesize_speech(
      input: { text: 'Lorem ipsum' },
      voice: { name: 'en-US-Studio-M', language_code: 'en-US' },
      audio_config: {audio_encoding: 'MP3'}
    )
    
    response.audio_content
    

    Note that the audio_content is a binary.

    So if you want to write it to an MP3 file, you need to use wb instead of just w when opening the file:

    File.open("test.mp3", "wb") do |file|
      file.write response.audio_content
    end