Search code examples
c#audiodialogflow-esnaudio

How to send an Audio for Dialogflow using C# library - DetectIntent


I am using the Dialogflow C# Library Google.Cloud.Dialogflow.V2 to communicate with my Dialogflow Agent.

Everything works find when sending Text using the DetectIntentAsync()

My issue is when sending an AUDIO and more precisely with this Format: .AAC

To be able to send an audio using DetectIntentAsync() we need to create a DetectIntentRequest like below


 DetectIntentRequest detectIntentRequest = new DetectIntentRequest
            {
                InputAudio = **HERE WHERE I HAVE AN ISSUE**,
                QueryInput = queryInput,
                Session = "projects/" + _sessionName.ProjectId + "/agent/sessions/" + _sessionName.SessionId
            };

Where the QueryInput is configured with AudioConfig like below

            QueryInput queryInput = new QueryInput
            {
                AudioConfig = audioConfig,
            };

Where the AudioConfig is configured like below

   var audioConfig= new InputAudioConfig
            {
                AudioEncoding = **HAVING ISSUE HERE ON HOW TO CHOOSE THE CORRECT ENCODING**,
                LanguageCode = "en-US",
                ModelVariant = SpeechModelVariant.Unspecified,
                SampleRateHertz = **HAVING ISSUE HERE ON HOW TO CHOOSE THE CORRECT SAMPLE RATE HERTZ**,
            };

PROBLEM

  • How to figure out what SampleRateHertz to choose?
  • How to figure out what AudioEncoding to choose?
  • How to provide the correct Protobuf.ByteString to InputAudio?
  • What if i want to use other formats than .AAC, how to automatically provide the needed info?

    WHAT I TESTED

I got the byte[] from a URL

// THE AUDIO IS A .AAC FILE
string audio = "https://cdn.fbsbx.com/v/t59.3654-21/72342591_3243833722299817_3308062589669343232_n.aac/audioclip-1575911942672-2279.aac?_nc_cat=102&_nc_ohc=heP60KND_DMAQl5-tE77rKNtUzHw_aILXdKfPPejdr7YVqzbYLQRv9BWA&_nc_ht=cdn.fbsbx.com&oh=1c4dbf0a64e0d1fb057b79354c17ca1c&oe=5DF17429";
byte[] audioBytes;
            using (var webClient = new WebClient())
            {
                audioBytes = webClient.DownloadData(audio);
            }

Then I added that into the DetectIntentRequest like below

DetectIntentRequest detectIntentRequest = new DetectIntentRequest
            {
                InputAudio = Google.Protobuf.ByteString.CopyFrom(audioBytes),
                QueryInput = queryInput,
                Session = "projects/" + _sessionName.ProjectId + "/agent/sessions/" + _sessionName.SessionId
            };

If I do not specify the SampleRateHertz i get the following error:

Error: "{"Status(StatusCode=InvalidArgument, Detail=\"Invalid input audio or config. Unable to calculate audio duration. Possibly no audio data sent.\")"} "

I stopped getting the error when I Specified a SampleRateHertz value but this is the response I keep getting no matter what values I use in the AudioEncoding and SampleRateHertz:

Response: {{ "languageCode": "en" }}

Everything else in the DetectIntentResponse is null

Guidance/Help is appreciated

Thank you


Solution

  • For those who face the .AAC issue with dialogflow, I managed to get it working like below:

     DetectIntentResponse response = new DetectIntentResponse();
                var queryAudio = new InputAudioConfig
                {
                    LanguageCode = LanguageCode,
                    ModelVariant = SpeechModelVariant.Unspecified,
                };
    
                QueryInput queryInput = new QueryInput
                {
                    AudioConfig = queryAudio,
                };
    
                    var filename = "fileName".wav";
                    // userAudioInput is the .AAC string URL 
                    // creating and saving the wav format from AAC
                    using (var reader = new MediaFoundationReader(userAudioInput))
                    {
                        Directory.CreateDirectory(path);
                        WaveFileWriter.CreateWaveFile(path + "/" + filename, reader);
                    }
                    // Reading the previously saved wav file
                    byte[] inputAudio = File.ReadAllBytes(path + "/" + filename);
    
                    DetectIntentRequest detectIntentRequest = new DetectIntentRequest
                    {
                        //InputAudio = Google.Protobuf.ByteString.CopyFrom(ReadFully(outputStreamMono)),
                        InputAudio = Google.Protobuf.ByteString.CopyFrom(inputAudio),
                        QueryInput = queryInput,
                        Session = "projects/" + _sessionName.ProjectId + "/agent/sessions/" + _sessionName.SessionId
                    };
    
                    // Make the request
                    response = await _sessionsClient.DetectIntentAsync(detectIntentRequest);