I am trying to create a video in IOS with Text-to-speech (like TikTok does). The only way to do this that I thought was to merge a video and an audio with AVFoundations, but it seems impossible to insert the audio of a text-to-speech into a .caf file.
This is what I tried:
public async Task amethod(string[] _text_and_position)
string[] text_and_position = (string[])_text_and_position;
double tts_starting_position = Convert.ToDouble(text_and_position[0]);
string text = text_and_position[1];
var synthesizer = new AVSpeechSynthesizer();
var su = new AVSpeechUtterance(text)
Rate = 0.5f,
Volume = 1.6f,
PitchMultiplier = 1.4f,
Voice = AVSpeechSynthesisVoice.FromLanguage("en-us")
Action<AVAudioBuffer> buffer = new Action<AVAudioBuffer>(asss);
synthesizer.WriteUtterance(su, buffer);
catch (Exception error) { }
public async void asss(AVAudioBuffer _buffer)
var pcmBuffer = (AVAudioPcmBuffer)_buffer;
if (pcmBuffer.FrameLength == 0)
// done
AVAudioFile output = null;
// append buffer to file
NSError error;
if (output == null)
string filePath = Path.Combine(Path.GetTempPath(), "TTS/" + 1 + ".caf");
NSUrl fileUrl = NSUrl.FromFilename(filePath);
output = new AVAudioFile(fileUrl, pcmBuffer.Format.Settings, AVAudioCommonFormat.PCMInt16 , false ,out error);
output.WriteFromBuffer(pcmBuffer, out error);
catch (Exception error)
new UIAlertView("Error", error.ToString(), null, "OK", null).Show();
This is the same code in objective-c
let synthesizer = AVSpeechSynthesizer()
let utterance = AVSpeechUtterance(string: "test 123")
utterance.voice = AVSpeechSynthesisVoice(language: "en")
var output: AVAudioFile?
synthesizer.write(utterance) { (buffer: AVAudioBuffer) in
guard let pcmBuffer = buffer as? AVAudioPCMBuffer else {
fatalError("unknown buffer type: \(buffer)")
if pcmBuffer.frameLength == 0 {
// done
} else {
// append buffer to file
if output == nil {
output = AVAudioFile(
forWriting: URL(fileURLWithPath: "test.caf"),
settings: pcmBuffer.format.settings,
commonFormat: .pcmFormatInt16,
interleaved: false)
output?.write(from: pcmBuffer)
The problem with this code is that "synthesizer.WriteUtterance(su, buffer);" always crashes, after reading other posts I believe this is a bug that results in the callback method (buffer) never being called.
Do you know of any workaround to this bug or any other way to achieve what I am trying to do?
Thanks for your time, have a great day.
EDIT: I commented synthesizer.SpeakUtterance(su); as ColeX pointed out and now the callback method is executed. Unfortunately, I can't store my audios in a file yet since I get another error in
output = new AVAudioFile(fileUrl, pcmBuffer.Format.Settings, AVAudioCommonFormat.PCMInt16 , false ,out error);
Could not initialize an instance of the type 'AVFoundation.AVAudioFile': the native 'initForWriting:settings:commonFormat:interleaved:error:' method returned nil. It is possible to ignore this condition by setting ObjCRuntime.Class.ThrowOnInitFailure to false.
The error simply shows An AVSpeechUtterance shall not be enqueued twice
So stop making it speak and write in the same time .
I used your code and comment out synthesizer.SpeakUtterance(su);
, error gone .
Based on my test , it does not allow to create extra subfolder , so remove the TTS/
part , just leave the file name alone .
string filePath = Path.Combine(Path.GetTempPath(), 1 + ".caf");