Search code examples
c#httpwebrequestgoogle-text-to-speech.net-7.0

Trying to use Google TTS using HttpWebRequest but it doesn't work


I'm trying to use Google's TTS to convert a text to speech and I want to use HttpWebRequest for this, as I want to avoid any third-party clients. (This is because I'm practicing calling Web API's.) So I have this code:

HttpWebRequest requestVoices = (HttpWebRequest)WebRequest.Create($"https://texttospeech.googleapis.com/v1/voices?key={bearerKey}");
using StreamReader streamReader = new(requestVoices.GetResponse().GetResponseStream());
var voiceList = GoogleVoices.LoadFromJson(streamReader.ReadToEnd()).Sort();
File.WriteAllText(Path.Combine(Environment.CurrentDirectory, "Speaking.json"), voiceList.SaveAsJson());

This works fine. I get a list of all voices, ordered by gender, then name. So, being ambitious, I want to generate a sound file, using this configuration:

var sound = new SpeechConfig()
{
    Input = new SpeechConfig.InputConfig(){Text = "Hello, World!"},
    AudioConfig = new SpeechConfig.AudioSpeechConfig()
    {
        AudioEncoding = "MP3",
        EffectsProfileId = new List<string>(){ "large-home-entertainment-class-device" },
        Pitch = 0.0,
        SpeakingRate = 1.0
    },
    Voice = new SpeechConfig.VoiceConfig()
    {
        Name = "en-IN",
        LanguageCode = "en-IN-Wavenet-D"
    }
};
File.WriteAllText(Path.Combine(Environment.CurrentDirectory, "Speak.json"), sound.SaveAsJson());

And to check if it generates proper JSON, this is what gets written to that file:

{
  "AudioConfig": {
    "AudioEncoding": "MP3",
    "EffectsProfileId": [
      "large-home-entertainment-class-device"
    ],
    "Pitch": 0.0,
    "SpeakingRate": 1.0
  },
  "Input": {
    "Text": "Hello, World!"
  },
  "Voice": {
    "LanguageCode": "en-IN-Wavenet-D",
    "Name": "en-IN"
  }
}

So, that looks good too. So now I need to call the TTS API, using this:

HttpWebRequest request = (HttpWebRequest)WebRequest.Create("https://texttospeech.googleapis.com/v1/text:synthesize/");
request.Headers.Add("Authorization", $"Bearer {bearerKey}");
request.Headers.Add("Content-Type", "application/json; charset=utf-8");
request.Method = "POST";
request.ContentType = "application/json";
using (StreamWriter streamWriter = new(request.GetRequestStream()))
{
    streamWriter.Write(sound.SaveAsJson());
    streamWriter.Flush();
    streamWriter.Close();
}
HttpWebResponse response = (HttpWebResponse)request.GetResponse();
string soundFileName= Path.Combine(Environment.CurrentDirectory, "Sound.mp3");
var soundFile = File.Create(soundFileName);
response.GetResponseStream().CopyTo(soundFile);
soundFile.Close();

But nothing gets written and I get the error: Unhandled exception. System.Net.WebException: The remote server returned an error: (401) Unauthorized. at System.Net.HttpWebRequest.GetResponse() at Program.<Main>$(String[] args)... Now, I know the bearer key is correct as I get a list of voices.

Using Google gives me way too many results, most of which use third-party libraries or talk about the "Translate" API. So that's not very useful. So I used OpenAI (ChatGPT) to give me an example, which wasn't much different, but used this as url: "https://texttospeech.googleapis.com/v1/text:synthesize?key=YOUR_API_KEY&input.text=" + textToSynthesize; Well, that doesn't work either. So, I tried this as something suggested it:

HttpWebRequest request = (HttpWebRequest)WebRequest.Create(url);
request.Headers.Add("Authorization", "Bearer $(gcloud auth print-access-token)");
request.Headers.Add("X-goog-api-key", $"{bearerKey}");

Still no luck. And there are no restrictions on this API key, except it's less than an hour old. Could it be that I just have to wait longer? Or am I doing something wrong? I must be doing something wrong, I think. I get the list of voices, but not the sound. What am I doing wrong?


Solution

  • And I'm an idiot! :D This part:

      "Voice": {
        "LanguageCode": "en-IN-Wavenet-D",
        "Name": "en-IN"
      }
    

    should be:

      "Voice": {
        "LanguageCode": "en-IN",
        "Name": "en-IN-Wavenet-D"
      }
    

    Also, the JSON is case-sensitive...

    Working code:

    public async Task Speak(string filename, string bearerKey, string text)
    {
        string url = $"https://texttospeech.googleapis.com/v1/text:synthesize?key={bearerKey}";
        var client = new HttpClient();
        var context = new StringContent(Sound(text).SaveAsJson(), Encoding.UTF8, "application/json");
        using HttpResponseMessage response = await client.PostAsync(url, context);
        var result = AudioContent.LoadFromJson(await response.Content.ReadAsStringAsync());
        File.WriteAllBytes(filename, System.Convert.FromBase64String(result.Content));
    }
    

    And yes, decided to use HttpClient() as it was easier. This method is part of a class that holds the voice data. Sound is created by:

    public SpeechConfig Sound(string text) =>
        new()
        {
            Input = new SpeechConfig.InputConfig()
            {
                Text = text
            },
            AudioConfig = new SpeechConfig.AudioSpeechConfig()
            {
                AudioEncoding = "MP3",
                EffectsProfileId = new List<string>() { "large-home-entertainment-class-device" },
                Pitch = Value.Pitch,
                SpeakingRate = Value.SpeakingRate
            },
            Voice = new SpeechConfig.VoiceConfig()
            {
                Name = Value.Selection.Name,
                LanguageCode = Value.Selection.LanguageCode
            }
        };
    

    In the same class.