Search code examples
javascripthttp-postgoogle-cloud-speechunity-webgl

How to send an HTTP POST request to Google Cloud Speech To Text using plain Javascript


I am creating an Unity WebGL build and I have to use Google Cloud Speech To Text API in my app. Unity does not support microphones in WebGL builds, but there is a workaround using jslib files that accesses the microphone through a Javascript code. The problem is, no matter what I try or where I look across the web, there is no documentation on how can I submit this data for processing in Google Cloud using an HTTP POST request using plain Javascript (since I cannot easily use other methods or libraries and don't want this code to be too complicated). I created an API key (might not be necessary...) but all the requests I try sending such as the code below, returns Error Code 400 Bad Request or similar:

fetch("https://speech.googleapis.com/v1/speech:recognize?key=API_KEY", {
    method: "POST",
    body: JSON.stringify(payload),
    headers: {
        "Content-Type": "application/json"
    }
})
.then(response => response.json())
.then(data => {
    // 3. Process the response
    processResponse(data);
})
.catch(error => {
    console.error('Error:', error);
});

Heck, I even tried asking ChatGPT 4 and got no answer. I admit I am not a Javascript person let alone an expert, so if anyone is familiar with creating such requests, please share your knowledge with me. Thank you!

EDIT: Since it appears to be somewhat unclear, this is the full code (I don't care about conventions or styling at the moment, I need the core functionality to work first):

StartRecording: function () {
    console.log("Beginning of StartRecording");

// Function to send audio data to Google Cloud Speech-to-Text API
      var sendToSpeechToText = function (blob) {
      console.log("Beginning of SendToSpeechToText");
      const apiKey = '<REMOVED>'; // Replace with your Google Cloud API key
      const url = `https://speech.googleapis.com/v1/speech:recognize?key=${apiKey}`;
      const reader = new FileReader();
  
      reader.onload = function() {
          const base64data = reader.result;
          const audioBytes = base64data.split('base64,')[1];
  
          const requestData = {
              config: {
                  encoding: 'WEBM_OPUS',
                  sampleRateHertz: 16000,
                  languageCode: 'en-US'
              },
              audio: {
                  content: audioBytes
              }
          };
  
          fetch(url, {
              method: 'POST',
              body: JSON.stringify(requestData),
              headers: {
                  'Content-Type': 'application/json'
              }
          })
          .then(response => response.json())
          .then(data => {
            console.log("Data Received!");
              // Process the response data (transcript)
              window.alert(data["results"]["0"]["alternatives"]["0"]["transcript"]);
          })
          .catch(error => console.error('Error:', error));
      };
        console.log("End of SendToSpeechToText");
      reader.readAsDataURL(blob);
    };

   var handleSuccess = function(stream) {
       console.log("Beginning of HandleSuccess");
       const options = {
           mimeType: 'audio/webm'
       };
       const recordedChunks = [];
       const mediaRecorder = new MediaRecorder(stream, options);
   
       mediaRecorder.addEventListener('dataavailable', function(e) {
           if (e.data.size > 0) recordedChunks.push(e.data);
       });
   
       mediaRecorder.addEventListener('stop', function() {
           sendToSpeechToText(new Blob(recordedChunks));
       });
   
       mediaRecorder.start();
       // For example, stop recording after 5 seconds
       setTimeout(() => {
           mediaRecorder.stop();
       }, 5000);
       
       console.log("End of HandleSuccess");
   };
   
   navigator.mediaDevices.getUserMedia({ audio: {
                                                    deviceId: "default",
                                                    sampleRate: 16000,
                                                    sampleSize: 16,
                                                    channelCount: 1
                                                  }, video: false })
              .then(handleSuccess);
   console.log("End of StartRecording");


 }

I also tried adding Authorization: 'Bearer ${apiKey}' to the headers instead of supplying the API Key in the url, but same result.


Solution

  • I'm leaving it here just in case someone will face this use case in the future:

    While I could not find the answer I wanted, I did find a workaround in the form of using a local NodeJS server. It produces an additional layer of complexity (and another service that has to be maintained) but it gave me the ability to perform the task I wanted.

    I just post the request to the NodeJS local server, it reads the base64 encoded audio data and parameters for the Google Cloud request, generates an API key using a service account I set up, sends and awaits the request to Google Cloud Speech To Text API for processing. When a response is received, it just propagates it back as a response to the original post request.