Search code examples
swiftazurespeech-recognitionspeech-to-text

stopContinuousRecognition() blocks the app for 5-7 seconds


I am trying to implement speech recognition using the Azure Speech SDK in iOS project using Swift and I ran into the problem that the speech recognition completion function (stopContinuousRecognition()) blocks the app UI for a few seconds, but there is no memory or processor load or leak. I tried to move this function to DispatchQueue.main.async {}, but it gave no results. Maybe someone faced such a problem? Is it necessary to put this in a separate thread and why does the function take so long to finish?

Edit: It is very hard to provide working example, but basically I am calling this function on button press:

 private func startListenAzureRecognition(lang:String) {
    let audioFormat = SPXAudioStreamFormat.init(usingPCMWithSampleRate: 8000, bitsPerSample: 16, channels: 1)
    azurePushAudioStream = SPXPushAudioInputStream(audioFormat: audioFormat!)
    let audioConfig = SPXAudioConfiguration(streamInput: azurePushAudioStream!)!
    var speechConfig: SPXSpeechConfiguration?

    do {
      let sub = "enter your code here"
      let region = "enter you region here"
      try speechConfig = SPXSpeechConfiguration(subscription: sub, region: region)
      speechConfig!.enableDictation();
      speechConfig?.speechRecognitionLanguage = lang
    } catch {
      print("error \(error) happened")
      speechConfig = nil
    }

    self.azureRecognition = try! SPXSpeechRecognizer(speechConfiguration: speechConfig!, audioConfiguration: audioConfig)
    
    self.azureRecognition!.addRecognizingEventHandler() {reco, evt in
      if (evt.result.text != nil && evt.result.text != "") {
         print(evt.result.text ?? "no result")
      }
    }
    self.azureRecognition!.addRecognizedEventHandler() {reco, evt in
      if (evt.result.text != nil && evt.result.text != "") {
        print(evt.result.text ?? "no result")
      }
    }
    do {
     try! self.azureRecognition?.startContinuousRecognition()
    } catch {
      print("error \(error) happened")
    }
  }

And when I press the button again to stop recognition, I am calling this function:

private func stopListenAzureRecognition(){
   DispatchQueue.main.async {
      print("start")
      // app blocks here
      try! self.azureRecognition?.stopContinuousRecognition()
      self.azurePushAudioStream!.close()
      self.azureRecognition = nil
      self.azurePushAudioStream = nil
      print("stop")
    }
  }

Also I am using raw audio data from mic (recognizeOnce works perfectly for first phrase, so everything is fine with audio data)


Solution

  • Try closing the stream first and then stopping the continuous recognition:

    azurePushAudioStream!.close()
    try! azureRecognition?.stopContinuousRecognition()
    azureRecognition = nil
    azurePushAudioStream = nil
    

    You don't even need to do it asynchronously.

    At least this worked for me.