Guidance Needed: Voice Activity Detection and Timeout Handling with Cloud Speech-to-Text API C++ Client Library

My scenario is quite basic: after the user's wake-word (I don't use Google Speech for that), I begin sending the microphone data to the Google Speech Recognition service. The problem I'm trying to address is this: if the user activates my voice assistant and immediately afterward says nothing, I want to stop the process of sending silence to the service. I believe that VoiceActivityTimeout is specifically designed for this purpose. However, as I'm not proficient with protobuf, I'm unsure how to implement this or how it should function.

This is what I tried. I took this example and changed it: streaming_transcribe_singlethread.cc:

  private:
    Handler(google::cloud::CompletionQueue cq, ParseResult args)
        : cq_(std::move(cq)), file_(args.path, std::ios::binary) {
      auto& streaming_config = *request_.mutable_streaming_config();
      *streaming_config.mutable_config() = std::move(args.config);

    // My changes started here
    streaming_config.set_single_utterance ( true );
    streaming_config.set_enable_voice_activity_events ( true );
    auto& timeout = *streaming_config.mutable_voice_activity_timeout ();
    auto& duration = *timeout.mutable_speech_start_timeout ();
    duration.set_seconds ( 1 );
    // end of my changes

I'm expecting that running this code will cause the transcription process to stop right after the first chunk of data (the file is 10 seconds of silence, and I need to detect speech activity in the first second). Instead, I get the following:

./streaming_transcribe_singlethread silence.wav 
Sending 64k bytes.
Sending 64k bytes.
Sending 64k bytes.
Error in transcribe stream: CANCELLED: The operation was cancelled.

Silence.wav

I agree that if the stream is closed by timeout, such a result might be expected. However, I believe the initial chunk of data (approximately 3 seconds), should be adequate to determine the presence of silence. Alas, adjusting the timeout value doesn't seem to affect the outcome.

I'm pretty sure I'm doing something wrong. Unfortunately, I couldn't find any examples of how to set and handle timeouts. Could you please be so kind as to explain to me how it should work and what I'm doing wrong.

Thank you very much!

Solution

config.set_audio_channel_count ( 1 );

solved my problem.