How to store a text-to-speech audio in an audio file (IOS)

I am trying to create a video in IOS with Text-to-speech (like TikTok does). The only way to do this that I thought was to merge a video and an audio with AVFoundations, but it seems impossible to insert the audio of a text-to-speech into a .caf file.

This is what I tried:

public async Task amethod(string[] _text_and_position)
                string[] text_and_position = (string[])_text_and_position;
                double tts_starting_position = Convert.ToDouble(text_and_position[0]);
                string text = text_and_position[1];

                var synthesizer = new AVSpeechSynthesizer();
                var su = new AVSpeechUtterance(text)
                    Rate = 0.5f,
                    Volume = 1.6f,
                    PitchMultiplier = 1.4f,
                    Voice = AVSpeechSynthesisVoice.FromLanguage("en-us")

                Action<AVAudioBuffer> buffer = new Action<AVAudioBuffer>(asss);
                    synthesizer.WriteUtterance(su, buffer);
                catch (Exception error) { }
        public async void asss(AVAudioBuffer _buffer)
                var pcmBuffer = (AVAudioPcmBuffer)_buffer;

                if (pcmBuffer.FrameLength == 0)
                    // done
                    AVAudioFile output = null;
                    // append buffer to file
                    NSError error;

                    if (output == null)
                        string filePath = Path.Combine(Path.GetTempPath(), "TTS/" + 1 + ".caf");
                        NSUrl fileUrl = NSUrl.FromFilename(filePath);

                        output = new AVAudioFile(fileUrl, pcmBuffer.Format.Settings, AVAudioCommonFormat.PCMInt16 , false ,out error);
                    output.WriteFromBuffer(pcmBuffer, out error);
            catch (Exception error)
                new UIAlertView("Error", error.ToString(), null, "OK", null).Show();

This is the same code in objective-c

let synthesizer = AVSpeechSynthesizer()
let utterance = AVSpeechUtterance(string: "test 123")
utterance.voice = AVSpeechSynthesisVoice(language: "en")
var output: AVAudioFile?

synthesizer.write(utterance) { (buffer: AVAudioBuffer) in
   guard let pcmBuffer = buffer as? AVAudioPCMBuffer else {
      fatalError("unknown buffer type: \(buffer)")
   if pcmBuffer.frameLength == 0 {
     // done
   } else {
     // append buffer to file
     if output == nil { 
       output = AVAudioFile(
         forWriting: URL(fileURLWithPath: "test.caf"), 
         settings: pcmBuffer.format.settings, 
         commonFormat: .pcmFormatInt16, 
         interleaved: false) 
     output?.write(from: pcmBuffer)

The problem with this code is that "synthesizer.WriteUtterance(su, buffer);" always crashes, after reading other posts I believe this is a bug that results in the callback method (buffer) never being called.

Do you know of any workaround to this bug or any other way to achieve what I am trying to do?

Thanks for your time, have a great day.

EDIT: I commented synthesizer.SpeakUtterance(su); as ColeX pointed out and now the callback method is executed. Unfortunately, I can't store my audios in a file yet since I get another error in

output = new AVAudioFile(fileUrl, pcmBuffer.Format.Settings, AVAudioCommonFormat.PCMInt16 , false ,out error);


Could not initialize an instance of the type 'AVFoundation.AVAudioFile': the native 'initForWriting:settings:commonFormat:interleaved:error:' method returned nil. It is possible to ignore this condition by setting ObjCRuntime.Class.ThrowOnInitFailure to false.


  • The error simply shows An AVSpeechUtterance shall not be enqueued twice .

    So stop making it speak and write in the same time .

    I used your code and comment out synthesizer.SpeakUtterance(su); , error gone .


    Based on my test , it does not allow to create extra subfolder , so remove the TTS/ part , just leave the file name alone .

    string filePath = Path.Combine(Path.GetTempPath(),  1 + ".caf");