Search code examples
c#streamspeech-recognitionspeech-to-textibm-watson

Watson speech to text live stream C# code example


I'm trying to build an app in C# that will take an audio stream (from a file for now, but later it will be a web stream) and return transcriptions from Watson in real time as they become available, similar to the demo at https://speech-to-text-demo.mybluemix.net/

Does anyone know where I can find some sample code, preferably in C#, that could help me get started?

I tried this, based on the limited documentation at https://github.com/watson-developer-cloud/dotnet-standard-sdk/tree/development/src/IBM.WatsonDeveloperCloud.SpeechToText.v1, but I get a BadRequest error when I call RecognizeWithSession. I'm not sure if I'm on the right path here.

    static void StreamingRecognize(string filePath)
    {
        SpeechToTextService _speechToText = new SpeechToTextService();
        _speechToText.SetCredential(<user>, <pw>);
        var session = _speechToText.CreateSession("en-US_BroadbandModel");

        //returns initialized
        var recognizeStatus = _speechToText.GetSessionStatus(session.SessionId);

        //  set up observe
        var taskObserveResult = Task.Factory.StartNew(() =>
        {
            var result = _speechToText.ObserveResult(session.SessionId);
            return result;
        });

        //  get results
        taskObserveResult.ContinueWith((antecedent) =>
        {
            var results = antecedent.Result;
        });

        var metadata = new Metadata();
        metadata.PartContentType = "audio/wav";
        metadata.DataPartsCount = 1;
        metadata.Continuous = true;
        metadata.InactivityTimeout = -1;
        var taskRecognizeWithSession = Task.Factory.StartNew(() =>
        {
            using (FileStream fs = File.OpenRead(filePath))
            {
                _speechToText.RecognizeWithSession(session.SessionId, "audio/wav", metadata, fs, "chunked");
            }
        });
    }

Solution

  • Inside the Watson Developer Cloud - SDK's, in your programming language, you can see one folder called Examples, and you can access the example for using Speech to Text.

    The SDK has support for WebSockets which would satisfy your requirement of transcribing more real-time versus uploading an audio file.

    static void Main(string[] args)
            {
                Transcribe();
                Console.WriteLine("Press any key to exit");
                Console.ReadLine();
            }
    
            // http://www.ibm.com/smarterplanet/us/en/ibmwatson/developercloud/doc/getting_started/gs-credentials.shtml
            static String username = "<username>";
            static String password = "<password>";
    
            static String file = @"c:\audio.wav";
    
            static Uri url = new Uri("wss://stream.watsonplatform.net/speech-to-text/api/v1/recognize");
            
            // these should probably be private classes that use DataContractJsonSerializer 
            // see https://msdn.microsoft.com/en-us/library/bb412179%28v=vs.110%29.aspx
            // or the ServiceState class at the end
            static ArraySegment<byte> openingMessage = new ArraySegment<byte>( Encoding.UTF8.GetBytes(
                "{\"action\": \"start\", \"content-type\": \"audio/wav\", \"continuous\" : true, \"interim_results\": true}"
            ));
            static ArraySegment<byte> closingMessage = new ArraySegment<byte>(Encoding.UTF8.GetBytes(
                "{\"action\": \"stop\"}"
            ));
            // ... more in the link below
    
    • Access the SDK C# here.
    • See the API reference for more information here.
    • One full example using Speech to Text by IBM Watson Developer here.