Search code examples
c#.netspeech-to-textazure-cognitive-services

Bot Framework - Using Custom Speech Service Error 400 C#


I created a bot with bot framework and now i'm trying to use the CustomSpeech service instead of the bing SpeechToText Service that works fine. I have tried various way to resolve the problem but i get the error 400 and i don't know how to solve this.

The method where i would like to get the text from a Stream of a wav pcm audio:

    public static async Task<string> CustomSpeechToTextStream(Stream audioStream)
    {
        audioStream.Seek(0, SeekOrigin.Begin);

        var customSpeechUrl = "https://westus.stt.speech.microsoft.com/speech/recognition/interactive/cognitiveservices/v1?cid=<MyEndPointId>";
        string token;

        token = GetToken();

        HttpWebRequest request = null;
        request = (HttpWebRequest)HttpWebRequest.Create(customSpeechUrl);
        request.SendChunked = true;
        //request.Accept = @"application/json;text/xml";
        request.Method = "POST";
        request.ProtocolVersion = HttpVersion.Version11;
        request.ContentType = "audio/wav; codec=\"audio/pcm\"; samplerate=16000";
        request.Headers["Authorization"] = "Bearer " + token;

        byte[] buffer = null;
        int bytesRead = 0;
        using (Stream requestStream = request.GetRequestStream())
        {
            // Read 1024 raw bytes from the input audio file.
            buffer = new Byte[checked((uint)Math.Min(1024, (int)audioStream.Length))];
            while ((bytesRead = audioStream.Read(buffer, 0, buffer.Length)) != 0)
            {
                requestStream.Write(buffer, 0, bytesRead);
            }

            requestStream.Flush();
        }

        string responseString = string.Empty;

        // Get the response from the service.
        using (WebResponse response = request.GetResponse()) // Here i get the error
        {
            using (StreamReader sr = new StreamReader(response.GetResponseStream()))
            {
                responseString = sr.ReadToEnd();
            }
        }

        dynamic deserializedResponse = Newtonsoft.Json.JsonConvert.DeserializeObject(responseString);

        if (deserializedResponse.RecognitionStatus == "Success")
        {
            return deserializedResponse.DisplayText;
        }
        else
        {
            return null;
        }
    }

At using (WebResponse response = request.GetResponse()){} i get an exception (Error 400).

Am I doing the HttpWebRequest in the right way?

I read in internet that maybe the problem is the file audio... but then why with the same Stream bing speech service doesn't return this error?


Solution

  • In my case the problem was that i had a wav stream audio that doesn't had the file header that Cris (Custom Speech Service) needs. The sulution is creating a temporary file wav, read the file wav and copy it in a Stream to send it as array to Cris

    byte[] buffer = null;
    int bytesRead = 0;
    using (Stream requestStream = request.GetRequestStream())
    {
        buffer = new Byte[checked((uint)Math.Min(1024, (int)audioStream.Length))];
        while ((bytesRead = audioStream.Read(buffer, 0, buffer.Length)) != 0)
        {
            requestStream.Write(buffer, 0, bytesRead);
        }
    
        requestStream.Flush();
    }
    

    or copy it in a MemoryStream and send it as array

    using (Stream requestStream = request.GetRequestStream())
    {
        requestStream.Write(audioStream.ToArray(), 0, audioStream.ToArray().Length);
        requestStream.Flush();
    }