I am maintaining a Push-to-talk VoIP app. When a PTT call is running the app create an audio session
m_AudioSession = AVAudioSession.SharedInstance();
NSError error;
if (!m_AudioSession.SetCategory(AVAudioSession.CategoryPlayAndRecord, AVAudioSessionCategoryOptions.DefaultToSpeaker | AVAudioSessionCategoryOptions.AllowBluetooth, out error))
{
IOSErrorLogger.Log(DammLoggerLevel.Error, TAG, error, "Error setting the category");
}
if (!m_AudioSession.SetMode(AVAudioSession.ModeVoiceChat, out error))
{
IOSErrorLogger.Log(DammLoggerLevel.Error, TAG, error, "Error setting the mode");
}
if (!m_AudioSession.OverrideOutputAudioPort(AVAudioSessionPortOverride.Speaker, out error))
{
IOSErrorLogger.Log(DammLoggerLevel.Error, TAG, error, "Error redirecting the audio to the loudspeaker");
}
if (!m_AudioSession.SetPreferredIOBufferDuration(0.06, out error)) // 60 milli seconds
{
IOSErrorLogger.Log(DammLoggerLevel.Error, TAG, error, "Error setting the preferred buffer duration");
}
if (!m_AudioSession.SetPreferredSampleRate(8000, out error)) // kHz
{
IOSErrorLogger.Log(DammLoggerLevel.Error, TAG, error, "Error setting the preferred sample rate");
}
if (!m_AudioSession.SetActive(true, out error))
{
IOSErrorLogger.Log(DammLoggerLevel.Error, TAG, error, "Error activating the audio session");
}
The received audio is played using the OutputAudioQueue and the microphone audio is captured (as mentioned in the Apple Doc: https://developer.apple.com/documentation/avfaudio/avaudiosession/mode/1616455-voicechat) using a Voice-Processing I/O Unit. The initialization code for Voice-Processing I/O Unit is:
AudioStreamBasicDescription audioFormat = new AudioStreamBasicDescription()
{
SampleRate = SAMPLERATE_8000,
Format = AudioFormatType.LinearPCM,
FormatFlags = AudioFormatFlags.LinearPCMIsSignedInteger | AudioFormatFlags.LinearPCMIsPacked,
FramesPerPacket = 1,
ChannelsPerFrame = CHANNELS,
BitsPerChannel = BITS_X_SAMPLE,
BytesPerPacket = BYTES_X_SAMPLE,
BytesPerFrame = BYTES_X_FRAME,
Reserved = 0
};
AudioComponent audioComp = AudioComponent.FindComponent(AudioTypeOutput.VoiceProcessingIO);
AudioUnit.AudioUnit voiceProcessing = new AudioUnit.AudioUnit(audioComp);
AudioUnitStatus unitStatus = AudioUnitStatus.NoError;
unitStatus = voiceProcessing.SetEnableIO(true, AudioUnitScopeType.Input, ELEM_Mic);
if (unitStatus != AudioUnitStatus.NoError)
{
DammLogger.Log(DammLoggerLevel.Warn, TAG, "Audio Unit SetEnableIO(true, AudioUnitScopeType.Input, ELEM_Mic) returned: {0}", unitStatus);
}
unitStatus = voiceProcessing.SetEnableIO(true, AudioUnitScopeType.Output, ELEM_Speaker);
if (unitStatus != AudioUnitStatus.NoError)
{
DammLogger.Log(DammLoggerLevel.Warn, TAG, "Audio Unit SetEnableIO(false, AudioUnitScopeType.Output, ELEM_Speaker) returned: {0}", unitStatus);
}
unitStatus = voiceProcessing.SetFormat(audioFormat, AudioUnitScopeType.Output, ELEM_Mic);
if (unitStatus != AudioUnitStatus.NoError)
{
DammLogger.Log(DammLoggerLevel.Warn, TAG, "Audio Unit SetFormat (MIC-OUTPUT) returned: {0}", unitStatus);
}
unitStatus = voiceProcessing.SetFormat(audioFormat, AudioUnitScopeType.Input, ELEM_Speaker);
if (unitStatus != AudioUnitStatus.NoError)
{
DammLogger.Log(DammLoggerLevel.Warn, TAG, "Audio Unit SetFormat (ELEM 0-INPUT) returned: {0}", unitStatus);
}
unitStatus = voiceProcessing.SetRenderCallback(AudioUnit_RenderCallback, AudioUnitScopeType.Input, ELEM_Speaker);
if (unitStatus != AudioUnitStatus.NoError)
{
DammLogger.Log(DammLoggerLevel.Warn, TAG, "Audio Unit SetRenderCallback returned: {0}", unitStatus);
}
...
voiceProcessing.Initialize();
voiceProcessing.Start();
And the RenderCallback function is:
private AudioUnitStatus AudioUnit_RenderCallback(AudioUnitRenderActionFlags actionFlags, AudioTimeStamp timeStamp, uint busNumber, uint numberFrames, AudioBuffers data)
{
AudioUnit.AudioUnit voiceProcessing = m_VoiceProcessing;
if (voiceProcessing != null)
{
// getting microphone input signal
var status = voiceProcessing.Render(ref actionFlags, timeStamp, ELEM_Mic, numberFrames, data);
if (status != AudioUnitStatus.OK)
{
return status;
}
if (data.Count > 0)
{
unsafe
{
short* samples = (short*)data[0].Data.ToPointer();
for (uint idxSrcFrame = 0; idxSrcFrame < numberFrames; idxSrcFrame++)
{
... send the collected microphone audio (samples[idxSrcFrame])
}
}
}
}
return AudioUnitStatus.NoError;
}
I am facing the problem that if the loudspeaker is enabled: m_AudioSession.OverrideOutputAudioPort(AVAudioSessionPortOverride.Speaker, out error) then the microphone audio is corrupted (some times is impossible to understand the speech). If the loudspeaker is NOT enabled (the AVAudioSessionPortOverride.Speaker is not set) then the audio is very nice.
I have already verified that the NumberChannels in the AudioBuffer returned by the Render function is 1 (mono audio).
Any hit helping solved the problem is very appreciated. Thanks
Update: The AudioUnit_RenderCallback method is called every 32 ms. When the loudspeaker is disabled the received number of frames is 256 which is exact (sample rate is 8000). When the loudspeaker is enabled the received number of frames is 85. In both cases the GetAudioFormat returns the expected values: BitsPerChannel=16, BytesPerFrame=2, FramesPerPacket=1, ChannelsPerFrame=1, SampleRate=8000
Update: I end up using the Sample Rate from the Hardware and performing the down-sampling self. It is must understanding that the Audio Unit should be able to perform the down sampling https://developer.apple.com/library/archive/documentation/MusicAudio/Conceptual/AudioUnitHostingGuide_iOS/AudioUnitHostingFundamentals/AudioUnitHostingFundamentals.html#//apple_ref/doc/uid/TP40009492-CH3-SW11)) but it was not possible for me to make it working when the loudspeaker was enabled.
I hope you are testing this on an actual device and not a simulator.
In the code, have you tried using this:
sampleRate = AudioSession.CurrentHardwareSampleRate;
Rather than forcing the sample rate, it's best to check the sample rate from the Hardware. It could be that during loudspeaker usage, it changes the sample rate and thus creating an issue.
I would suggest recording based on the above changes and see if the audio improves and then experiment with other flags.
Standard recording pattern: https://learn.microsoft.com/en-us/dotnet/api/audiotoolbox.audiostreambasicdescription?view=xamarin-ios-sdk-12#remarks