Search code examples
google-speech-api

Google Speech API streaming


I'm trying to connect my PBX IVR to Google Speech API using syncrecognize method, but since I need to record voice, send it to API, wait for response and process it back, is impossible to have a normal conversation or attempt to have a normal conversation/process voice using it for real time services. Is there any other API is recommended? or is there a way to setup VoIP/Streaming to Google. Similar like Alexa/Google Home. I didnt find anything for RecognitionAudio object:


Solution

  • Google Cloud Speech API service supports two different functions:

    • Non Streaming Recognition, assuming you provide the full audio to Google platform and after it is processed you receive the result
    • Streaming Recognition, allowing you to feed audio interactively (in real time) and to get notified about results (partial, interim, results and final results) while audio chunks are processed

    Both the above operations can be:

    • Syncronous, execute the command and wait for result (suitable for standard recognition with short files)
    • Asyncronous, issue commands and wait to be notified/check for the specific command result (usually on a different thread or in a multhreading environment, mandatory mode for audio longer than one minute)

    Streaming Recognition API is recomended for your application scenario, consider it is accessible only with Cloud Speech RPC API (google RPC), while Non Streaming Recognition API is available both with both Cloud Speech REST and RPC API.

    We used AsyncStreamingRecognize in a similar application environment via gRPC API, the project was in C++, environment setup was quite long and complex (you need to download and build grpc, protoc and Google Speech API stubs/libraries for your environment). For the application we used as reference the examples available on Google Speech API Site, once the environment was ready it was quite simple to adapt the sample application logic to our specific scenario.

    With streaming API you have limitations on audio encoding (not all encoding are supported) and the length of the audio processed (in streaming you can process up to 1 minute of speech), moreover, you can access the API only with a Service Account that must be enabled to use Speech API.