Search code examples
c#azureazure-cognitive-servicesspeech-to-textazure-speech

Azure - Speech To Text - detect speaker channel


I am using Azure Speech To Text - continuous recognition to transcribe an audio file. I have my speakers split in stereo wav file into left and right channel. However when I am running the transcription I am not able the get channel correctly. I tried to receive it from the PropertyId.SpeechServiceResponse_JsonResult but that always returns 0. My expectation is 0 for left channel and 1 for right channel.

var speechConfig = SpeechConfig.FromSubscription(/*api key*/, /*region*/);
var audioConfig = AudioConfig.FromWavFileInput(filePath);
var recognizer = new SpeechRecognizer(speechConfig, audioConfig);

Is there some hidden property or missing configuration to achieve this?

My try to find the channel from the JsonResult property:

var speechServiceResponseJsonResultJson = eventArgs.Result.Properties.GetProperty(PropertyId.SpeechServiceResponse_JsonResult);

var channel = 0;
if (speechServiceResponseJsonResultJson != null)
{
    var speechServiceResponseJsonResult =
        JsonConvert.DeserializeObject<JObject>(
            eventArgs.Result.Properties.GetProperty(PropertyId
                .SpeechServiceResponse_JsonResult));

    if (speechServiceResponseJsonResult.TryGetValue("Channel", StringComparison.InvariantCultureIgnoreCase, out var channelValue))
    {
        channel = channelValue.ToObject<int>();
    }
}

Solution

  • There is currently no way how to achieve it through the SDK. What I ended up doing was splitting the audio and processing the channels separately and then combining the results with indication from which channel each record originated.