c++windows audio audio-processing wasapi

What is the expected effect of using IAudioClient2::SetClientProperties on a capture client in Windows 10?

The specification of IAudioClient2::SetClientProperties contains only one parameter but is it not clear to me what to expect from the API given the existing documentation. The parameter is given by:

typedef struct AudioClientProperties {
  UINT32                cbSize;
  BOOL                  bIsOffload;
  AUDIO_STREAM_CATEGORY eCategory;
  AUDCLNT_STREAMOPTIONS Options;
} AudioClientProperties;

I have a capture client and am trying to understand the exact consequence of using different combinations of eCategory and Options.

First of all: if I don't call SetClientProperties on my stream; what are then the default settings? Assume that there existed a corresponding GetClientProperties, is is possible to say what it would return?

If I set the stream category to AudioCategory_Speech and the stream option to AUDCLNT_STREAMOPTIONS_RAW, the manual states that

The audio stream is a 'raw' stream that bypasses
all signal processing except for endpoint specific,
always-on processing in the Audio Processing Object (APO), driver, and hardware.

Does that mean that any processing done by the Signal Enhancements is bypassed or is it some other type of built-in signal processing that is bypassed? I guess I don't really understand the endpoint specific,always-on part above.

Also, if I instead use AudioCategory_Communications and AUDCLNT_STREAMOPTIONS_RAW, are these two contradictive in any way? To me it feels as if AudioCategory_Communications should enable components useful for VoIP (e.g. AGC, NS, etc.) while the AUDCLNT_STREAMOPTIONS_RAW flag means "keep the audio path as clean as possible"?

Perhaps I can rephrase the last question. What is the difference in final behavior between using AudioCategory_Communications + AUDCLNT_STREAMOPTIONS_RAW and using AudioCategory_Speech + AUDCLNT_STREAMOPTIONS_RAW?

Solution

The eCategory has behavioral implications that go beyond audio effects. For example, if you have a VOIP app and you start an AudioCategory_Communications stream, that will cause movie apps to pause or be ducked, whether or not you use AUDCLNT_STREAMOPTIONS_RAW.

If your capture client is for VOIP, you want AudioCategory_Communications. If your capture client is for voice command or dictation, you want AudioCategory_Speech.

AUDCLNT_STREAMOPTIONS_RAW is only for very narrow circumstances. Usually you would welcome whatever audio processing was the default for your chosen eCategory.

On the other hand, if the ins and outs of audio processing are SUPER important to you, to the point where you are individually evaluating audio drivers on specific hardware, you may determine that certain specific models of computer have audio processing that doesn't work for your app.

In such a case (which should be rare), you should do two things:

Reach out to the manufacturer of that computer and tell them what you don't like about their audio processing, so they have a chance to convince you that what they're doing is really OK, or you have a chance to convince them that they really have a problem, in which case they should fix it.
While they're working on a fix, your app should, after determining that it is running on such a problem system, use AUDCLNT_STREAMFLAGS_RAW. Note your app will need to apply whatever processing it needs itself, since that processing is no longer provided by the system.

Your app can query for what audio effects would be applied to its chosen stream category, both in normal mode and raw mode, using the audio effects discovery API. There's a sample here: https://github.com/microsoftarchive/msdn-code-gallery-microsoft/tree/master/Official%20Windows%20Platform%20Sample/Audio%20effects%20discovery%20sample

The default, if you do not call IAudioClient2::SetClientProperties, is eCategory = AudioCategory_Other, which is usually not what you want.