c#httpwebrequest google-text-to-speech .net-7.0

Trying to use Google TTS using HttpWebRequest but it doesn't work

I'm trying to use Google's TTS to convert a text to speech and I want to use HttpWebRequest for this, as I want to avoid any third-party clients. (This is because I'm practicing calling Web API's.) So I have this code:

HttpWebRequest requestVoices = (HttpWebRequest)WebRequest.Create($"https://texttospeech.googleapis.com/v1/voices?key={bearerKey}");
using StreamReader streamReader = new(requestVoices.GetResponse().GetResponseStream());
var voiceList = GoogleVoices.LoadFromJson(streamReader.ReadToEnd()).Sort();
File.WriteAllText(Path.Combine(Environment.CurrentDirectory, "Speaking.json"), voiceList.SaveAsJson());

This works fine. I get a list of all voices, ordered by gender, then name. So, being ambitious, I want to generate a sound file, using this configuration:

var sound = new SpeechConfig()
{
    Input = new SpeechConfig.InputConfig(){Text = "Hello, World!"},
    AudioConfig = new SpeechConfig.AudioSpeechConfig()
    {
        AudioEncoding = "MP3",
        EffectsProfileId = new List<string>(){ "large-home-entertainment-class-device" },
        Pitch = 0.0,
        SpeakingRate = 1.0
    },
    Voice = new SpeechConfig.VoiceConfig()
    {
        Name = "en-IN",
        LanguageCode = "en-IN-Wavenet-D"
    }
};
File.WriteAllText(Path.Combine(Environment.CurrentDirectory, "Speak.json"), sound.SaveAsJson());

And to check if it generates proper JSON, this is what gets written to that file:

{
  "AudioConfig": {
    "AudioEncoding": "MP3",
    "EffectsProfileId": [
      "large-home-entertainment-class-device"
    ],
    "Pitch": 0.0,
    "SpeakingRate": 1.0
  },
  "Input": {
    "Text": "Hello, World!"
  },
  "Voice": {
    "LanguageCode": "en-IN-Wavenet-D",
    "Name": "en-IN"
  }
}

So, that looks good too. So now I need to call the TTS API, using this:

HttpWebRequest request = (HttpWebRequest)WebRequest.Create("https://texttospeech.googleapis.com/v1/text:synthesize/");
request.Headers.Add("Authorization", $"Bearer {bearerKey}");
request.Headers.Add("Content-Type", "application/json; charset=utf-8");
request.Method = "POST";
request.ContentType = "application/json";
using (StreamWriter streamWriter = new(request.GetRequestStream()))
{
    streamWriter.Write(sound.SaveAsJson());
    streamWriter.Flush();
    streamWriter.Close();
}
HttpWebResponse response = (HttpWebResponse)request.GetResponse();
string soundFileName= Path.Combine(Environment.CurrentDirectory, "Sound.mp3");
var soundFile = File.Create(soundFileName);
response.GetResponseStream().CopyTo(soundFile);
soundFile.Close();

But nothing gets written and I get the error: Unhandled exception. System.Net.WebException: The remote server returned an error: (401) Unauthorized. at System.Net.HttpWebRequest.GetResponse() at Program.<Main>$(String[] args)... Now, I know the bearer key is correct as I get a list of voices.

Using Google gives me way too many results, most of which use third-party libraries or talk about the "Translate" API. So that's not very useful. So I used OpenAI (ChatGPT) to give me an example, which wasn't much different, but used this as url: "https://texttospeech.googleapis.com/v1/text:synthesize?key=YOUR_API_KEY&input.text=" + textToSynthesize; Well, that doesn't work either. So, I tried this as something suggested it:

HttpWebRequest request = (HttpWebRequest)WebRequest.Create(url);
request.Headers.Add("Authorization", "Bearer $(gcloud auth print-access-token)");
request.Headers.Add("X-goog-api-key", $"{bearerKey}");

Still no luck. And there are no restrictions on this API key, except it's less than an hour old. Could it be that I just have to wait longer? Or am I doing something wrong? I must be doing something wrong, I think. I get the list of voices, but not the sound. What am I doing wrong?

Solution

And I'm an idiot! :D This part:

  "Voice": {
    "LanguageCode": "en-IN-Wavenet-D",
    "Name": "en-IN"
  }

should be:

  "Voice": {
    "LanguageCode": "en-IN",
    "Name": "en-IN-Wavenet-D"
  }

Also, the JSON is case-sensitive...

Working code:

public async Task Speak(string filename, string bearerKey, string text)
{
    string url = $"https://texttospeech.googleapis.com/v1/text:synthesize?key={bearerKey}";
    var client = new HttpClient();
    var context = new StringContent(Sound(text).SaveAsJson(), Encoding.UTF8, "application/json");
    using HttpResponseMessage response = await client.PostAsync(url, context);
    var result = AudioContent.LoadFromJson(await response.Content.ReadAsStringAsync());
    File.WriteAllBytes(filename, System.Convert.FromBase64String(result.Content));
}

And yes, decided to use HttpClient() as it was easier. This method is part of a class that holds the voice data. Sound is created by:

public SpeechConfig Sound(string text) =>
    new()
    {
        Input = new SpeechConfig.InputConfig()
        {
            Text = text
        },
        AudioConfig = new SpeechConfig.AudioSpeechConfig()
        {
            AudioEncoding = "MP3",
            EffectsProfileId = new List<string>() { "large-home-entertainment-class-device" },
            Pitch = Value.Pitch,
            SpeakingRate = Value.SpeakingRate
        },
        Voice = new SpeechConfig.VoiceConfig()
        {
            Name = Value.Selection.Name,
            LanguageCode = Value.Selection.LanguageCode
        }
    };

In the same class.