I'm trying to use Google's TTS to convert a text to speech and I want to use HttpWebRequest for this, as I want to avoid any third-party clients. (This is because I'm practicing calling Web API's.) So I have this code:
HttpWebRequest requestVoices = (HttpWebRequest)WebRequest.Create($"https://texttospeech.googleapis.com/v1/voices?key={bearerKey}");
using StreamReader streamReader = new(requestVoices.GetResponse().GetResponseStream());
var voiceList = GoogleVoices.LoadFromJson(streamReader.ReadToEnd()).Sort();
File.WriteAllText(Path.Combine(Environment.CurrentDirectory, "Speaking.json"), voiceList.SaveAsJson());
This works fine. I get a list of all voices, ordered by gender, then name. So, being ambitious, I want to generate a sound file, using this configuration:
var sound = new SpeechConfig()
{
Input = new SpeechConfig.InputConfig(){Text = "Hello, World!"},
AudioConfig = new SpeechConfig.AudioSpeechConfig()
{
AudioEncoding = "MP3",
EffectsProfileId = new List<string>(){ "large-home-entertainment-class-device" },
Pitch = 0.0,
SpeakingRate = 1.0
},
Voice = new SpeechConfig.VoiceConfig()
{
Name = "en-IN",
LanguageCode = "en-IN-Wavenet-D"
}
};
File.WriteAllText(Path.Combine(Environment.CurrentDirectory, "Speak.json"), sound.SaveAsJson());
And to check if it generates proper JSON, this is what gets written to that file:
{
"AudioConfig": {
"AudioEncoding": "MP3",
"EffectsProfileId": [
"large-home-entertainment-class-device"
],
"Pitch": 0.0,
"SpeakingRate": 1.0
},
"Input": {
"Text": "Hello, World!"
},
"Voice": {
"LanguageCode": "en-IN-Wavenet-D",
"Name": "en-IN"
}
}
So, that looks good too. So now I need to call the TTS API, using this:
HttpWebRequest request = (HttpWebRequest)WebRequest.Create("https://texttospeech.googleapis.com/v1/text:synthesize/");
request.Headers.Add("Authorization", $"Bearer {bearerKey}");
request.Headers.Add("Content-Type", "application/json; charset=utf-8");
request.Method = "POST";
request.ContentType = "application/json";
using (StreamWriter streamWriter = new(request.GetRequestStream()))
{
streamWriter.Write(sound.SaveAsJson());
streamWriter.Flush();
streamWriter.Close();
}
HttpWebResponse response = (HttpWebResponse)request.GetResponse();
string soundFileName= Path.Combine(Environment.CurrentDirectory, "Sound.mp3");
var soundFile = File.Create(soundFileName);
response.GetResponseStream().CopyTo(soundFile);
soundFile.Close();
But nothing gets written and I get the error: Unhandled exception. System.Net.WebException: The remote server returned an error: (401) Unauthorized. at System.Net.HttpWebRequest.GetResponse() at Program.<Main>$(String[] args)...
Now, I know the bearer key is correct as I get a list of voices.
Using Google gives me way too many results, most of which use third-party libraries or talk about the "Translate" API. So that's not very useful. So I used OpenAI (ChatGPT) to give me an example, which wasn't much different, but used this as url: "https://texttospeech.googleapis.com/v1/text:synthesize?key=YOUR_API_KEY&input.text=" + textToSynthesize;
Well, that doesn't work either. So, I tried this as something suggested it:
HttpWebRequest request = (HttpWebRequest)WebRequest.Create(url);
request.Headers.Add("Authorization", "Bearer $(gcloud auth print-access-token)");
request.Headers.Add("X-goog-api-key", $"{bearerKey}");
Still no luck. And there are no restrictions on this API key, except it's less than an hour old. Could it be that I just have to wait longer? Or am I doing something wrong? I must be doing something wrong, I think. I get the list of voices, but not the sound. What am I doing wrong?
And I'm an idiot! :D This part:
"Voice": {
"LanguageCode": "en-IN-Wavenet-D",
"Name": "en-IN"
}
should be:
"Voice": {
"LanguageCode": "en-IN",
"Name": "en-IN-Wavenet-D"
}
Also, the JSON is case-sensitive...
Working code:
public async Task Speak(string filename, string bearerKey, string text)
{
string url = $"https://texttospeech.googleapis.com/v1/text:synthesize?key={bearerKey}";
var client = new HttpClient();
var context = new StringContent(Sound(text).SaveAsJson(), Encoding.UTF8, "application/json");
using HttpResponseMessage response = await client.PostAsync(url, context);
var result = AudioContent.LoadFromJson(await response.Content.ReadAsStringAsync());
File.WriteAllBytes(filename, System.Convert.FromBase64String(result.Content));
}
And yes, decided to use HttpClient() as it was easier. This method is part of a class that holds the voice data. Sound is created by:
public SpeechConfig Sound(string text) =>
new()
{
Input = new SpeechConfig.InputConfig()
{
Text = text
},
AudioConfig = new SpeechConfig.AudioSpeechConfig()
{
AudioEncoding = "MP3",
EffectsProfileId = new List<string>() { "large-home-entertainment-class-device" },
Pitch = Value.Pitch,
SpeakingRate = Value.SpeakingRate
},
Voice = new SpeechConfig.VoiceConfig()
{
Name = Value.Selection.Name,
LanguageCode = Value.Selection.LanguageCode
}
};
In the same class.