Using VoiceProcessingIO for Voip and getting raw mic input as well

I am using a VoiceProcessingIO audio unit for voip calls. However, when I set the loud speaker (setting the kAudioSessionOverrideAudioRoute_Speaker audio session property), the PCM data received in the input callback by calling AudioUnitRender has a very low volume.

For a voip call, it is actually fine. The interlocutor hears it fainter, but he hears it. However, I would like to save to disk a good quality version of the input audio, possibly a raw audio from the mic.

Is it actually possible? In my tests I have not be able to do it. When VoiceProcessingIO is in use, the audio from the input-callback is just very low. Perhaps, I can get the unprocessed audio from some other source? Note, VoiceProcessingIO must still be used during the voip call.

The same question on Apple's forum is thread-655091, it has been asked 1 year ago and it has no answers. Closest questions on SO I found are Two audio units? and Effect before render callback?, but they are more concerned about the output of VoiceProcessingIO rather than the input.

An idea would be to add a parallel "raw" RemoteIO unit to get the audio from the mic, but both in Two audio units? and in apple-forum-110816, developers say it will not be possible to add another RemoteIO in parallel to the VoiceProcessingIO, because having set the audio session category as PlayAndRecord and the audio mode as VoiceChat, RemoteIO will not function as usual. I have not had a chance to try it, but it seems possible.

Are there other strategies? Are there some "pre-render input callbacks" called before VoiceProcessingIO unit kicks in and processes the raw data from the mic?

Is it possible to install some TAP between the mic and the VoiceProcessingIO unit?

Solution

AFAIK, there is no public API that allows getting both processed and unprocessed input from the microphone on an iOS device.

If you need processed input (voice processing for echo cancellation, etc.), then your best bet is to just add gain to the audio data for your other needs (via some DSP library, etc.), since it is float data.