google-api google-cloud-functions text-to-speech

Google Cloud Text-to-Speech - Timepoint returns an empty array

I am making use of the Google TTS API and would like to use timepoints in order to show words of a sentence at the right time. (like subtitles). Unfortunately, I can not get this to work.

HTTP request

POST https://texttospeech.googleapis.com/v1beta1/text:synthesize

Request body

  "input": {
      "ssml": "<speak>Hello World</speak>"
      },
  "voice": {
    "languageCode": "nl-NL",
    "name": "nl-NL-Wavenet-E",
    "ssmlGender": "FEMALE"
  },
  "audioConfig": {
    "audioEncoding": "MP3"
  },
  "enableTimePointing": [
    "SSML_MARK"
  ]
}

Response body

{
    "audioContent": "base64"
    "timepoints": [],
    "audioConfig": {
        "audioEncoding": "MP3",
        "speakingRate": 1,
        "pitch": 0,
        "volumeGainDb": 0,
        "sampleRateHertz": 24000,
        "effectsProfileId": []
    }
}

Im expecting a Timepoint object in return but as you can see, it returns an empty array.

Solution

For you to get timepoints, you just need to add <mark> on your input. Here is an example using your request body.

Request body:

{
  "input": {
      "ssml": "<speak><mark name=\"1st\"/>Hello <mark name=\"2nd\"/>world</speak>"
      },
  "voice": {
    "languageCode": "nl-NL",
    "name": "nl-NL-Wavenet-E",
    "ssmlGender": "FEMALE"
  },
  "audioConfig": {
    "audioEncoding": "MP3"
  },
  "enableTimePointing": [
    "SSML_MARK"
  ]
}

I added <mark name=\"1st\"/> and <mark name=\"2nd\"/> to create 2 marks to just to show how to add multiple marks. If you only need a single mark just remove the 2nd one and the response should just also show a single mark.

Response (I just included a snippet of the base64):