Search code examples
c#.netspeech-to-textgoogle-speech-apigoogle-speech-to-text-api

How to use Google Cloud Speech (V1 API) for speech to text - need to be able to process over 3 hours audio files properly and efficiently


I am looking for documentation and stuff but could not find a solution yet

Installed NuGet package

Also generated API key

However can't find proper documentation how to use API key

Moreover, I want to be able to upload very long audio files

So what would be the proper way to upload up to 3 hours audio files and get their results?

I have 300$ budget so should be enough

Here my so far code

This code currently fails since I have not set the credentials correctly at the moment which I don't know how to

I also have service account file ready to use

public partial class MainWindow : Window
{
    public MainWindow()
    {
        InitializeComponent();
    }

    private void Button_Click(object sender, RoutedEventArgs e)
    {
        var speech = SpeechClient.Create();           
        
        var config = new RecognitionConfig
        {               
            Encoding = RecognitionConfig.Types.AudioEncoding.Flac,
            SampleRateHertz = 48000,
            LanguageCode = LanguageCodes.English.UnitedStates
        };
        var audio = RecognitionAudio.FromStorageUri("1m.flac");

        var response = speech.Recognize(config, audio);

        foreach (var result in response.Results)
        {
            foreach (var alternative in result.Alternatives)
            {
                Debug.WriteLine(alternative.Transcript);
            }
        }
    }
}

enter image description here

I don't want to set environment variable. I have both API key and Service Account json file. How can I manually set?


Solution

  • You need to use the SpeechClientBuilder to create a SpeechClient with custom credentials, if you don't want to use the environment variable. Assuming you've got a service account file somewhere, change this:

    var speech = SpeechClient.Create();
    

    to this:

    var speech = new SpeechClientBuilder
    {
        CredentialsPath = "/path/to/your/file"
    }.Build();
    

    Note that to perform a long-running recognition operation, you should also use the LongRunningRecognize method - I strongly suspect your current RPC will fail, either explicitly because it's trying to run on a file that's too large, or it'll just time out.