I am using Azure Speech To Text - continuous recognition to transcribe an audio file. I have my speakers split in stereo wav file into left and right channel. However when I am running the transcription I am not able the get channel correctly. I tried to receive it from the PropertyId.SpeechServiceResponse_JsonResult
but that always returns 0. My expectation is 0 for left channel and 1 for right channel.
var speechConfig = SpeechConfig.FromSubscription(/*api key*/, /*region*/);
var audioConfig = AudioConfig.FromWavFileInput(filePath);
var recognizer = new SpeechRecognizer(speechConfig, audioConfig);
Is there some hidden property or missing configuration to achieve this?
My try to find the channel from the JsonResult
property:
var speechServiceResponseJsonResultJson = eventArgs.Result.Properties.GetProperty(PropertyId.SpeechServiceResponse_JsonResult);
var channel = 0;
if (speechServiceResponseJsonResultJson != null)
{
var speechServiceResponseJsonResult =
JsonConvert.DeserializeObject<JObject>(
eventArgs.Result.Properties.GetProperty(PropertyId
.SpeechServiceResponse_JsonResult));
if (speechServiceResponseJsonResult.TryGetValue("Channel", StringComparison.InvariantCultureIgnoreCase, out var channelValue))
{
channel = channelValue.ToObject<int>();
}
}
There is currently no way how to achieve it through the SDK. What I ended up doing was splitting the audio and processing the channels separately and then combining the results with indication from which channel each record originated.