javascript http-post google-cloud-speech unity-webgl

How to send an HTTP POST request to Google Cloud Speech To Text using plain Javascript

I am creating an Unity WebGL build and I have to use Google Cloud Speech To Text API in my app. Unity does not support microphones in WebGL builds, but there is a workaround using jslib files that accesses the microphone through a Javascript code. The problem is, no matter what I try or where I look across the web, there is no documentation on how can I submit this data for processing in Google Cloud using an HTTP POST request using plain Javascript (since I cannot easily use other methods or libraries and don't want this code to be too complicated). I created an API key (might not be necessary...) but all the requests I try sending such as the code below, returns Error Code 400 Bad Request or similar:

fetch("https://speech.googleapis.com/v1/speech:recognize?key=API_KEY", {
    method: "POST",
    body: JSON.stringify(payload),
    headers: {
        "Content-Type": "application/json"
    }
})
.then(response => response.json())
.then(data => {
    // 3. Process the response
    processResponse(data);
})
.catch(error => {
    console.error('Error:', error);
});

Heck, I even tried asking ChatGPT 4 and got no answer. I admit I am not a Javascript person let alone an expert, so if anyone is familiar with creating such requests, please share your knowledge with me. Thank you!

EDIT: Since it appears to be somewhat unclear, this is the full code (I don't care about conventions or styling at the moment, I need the core functionality to work first):

StartRecording: function () {
    console.log("Beginning of StartRecording");

// Function to send audio data to Google Cloud Speech-to-Text API
      var sendToSpeechToText = function (blob) {
      console.log("Beginning of SendToSpeechToText");
      const apiKey = '<REMOVED>'; // Replace with your Google Cloud API key
      const url = `https://speech.googleapis.com/v1/speech:recognize?key=${apiKey}`;
      const reader = new FileReader();
  
      reader.onload = function() {
          const base64data = reader.result;
          const audioBytes = base64data.split('base64,')[1];
  
          const requestData = {
              config: {
                  encoding: 'WEBM_OPUS',
                  sampleRateHertz: 16000,
                  languageCode: 'en-US'
              },
              audio: {
                  content: audioBytes
              }
          };
  
          fetch(url, {
              method: 'POST',
              body: JSON.stringify(requestData),
              headers: {
                  'Content-Type': 'application/json'
              }
          })
          .then(response => response.json())
          .then(data => {
            console.log("Data Received!");
              // Process the response data (transcript)
              window.alert(data["results"]["0"]["alternatives"]["0"]["transcript"]);
          })
          .catch(error => console.error('Error:', error));
      };
        console.log("End of SendToSpeechToText");
      reader.readAsDataURL(blob);
    };

   var handleSuccess = function(stream) {
       console.log("Beginning of HandleSuccess");
       const options = {
           mimeType: 'audio/webm'
       };
       const recordedChunks = [];
       const mediaRecorder = new MediaRecorder(stream, options);
   
       mediaRecorder.addEventListener('dataavailable', function(e) {
           if (e.data.size > 0) recordedChunks.push(e.data);
       });
   
       mediaRecorder.addEventListener('stop', function() {
           sendToSpeechToText(new Blob(recordedChunks));
       });
   
       mediaRecorder.start();
       // For example, stop recording after 5 seconds
       setTimeout(() => {
           mediaRecorder.stop();
       }, 5000);
       
       console.log("End of HandleSuccess");
   };
   
   navigator.mediaDevices.getUserMedia({ audio: {
                                                    deviceId: "default",
                                                    sampleRate: 16000,
                                                    sampleSize: 16,
                                                    channelCount: 1
                                                  }, video: false })
              .then(handleSuccess);
   console.log("End of StartRecording");


 }

I also tried adding Authorization: 'Bearer ${apiKey}' to the headers instead of supplying the API Key in the url, but same result.

Solution

I'm leaving it here just in case someone will face this use case in the future:

While I could not find the answer I wanted, I did find a workaround in the form of using a local NodeJS server. It produces an additional layer of complexity (and another service that has to be maintained) but it gave me the ability to perform the task I wanted.

I just post the request to the NodeJS local server, it reads the base64 encoded audio data and parameters for the Google Cloud request, generates an API key using a service account I set up, sends and awaits the request to Google Cloud Speech To Text API for processing. When a response is received, it just propagates it back as a response to the original post request.