Search code examples
c#naudio

How to mix multiple inputs and keep synchronization?


I'm having quite a hard time figuring out how to get multiple sources of audio into one stream without significant delay. I've followed NAudio's documentation for as far that's possible, and have written the following;

    public void Start()
    {
        _format = new WaveFormat(48000, 16, 2);

        _mixingSampleProvider = new MixingSampleProvider(WaveFormat.CreateIeeeFloatWaveFormat(48000, 2));

        _compressorStream = new SimpleCompressorStream(new MixingWaveStream(_mixingSampleProvider.ToWaveProvider()));
        _compressorStream.Enabled = true;

        foreach (var p in _activeRenderers)
        {
            p.Start(_compressorStream);
        }
    }

    public void RegisterProvider(IAudioProvider provider)
    {
        var audioWaveProvider = new AudioWaveProvider(provider, _format);
        _providers.Add(audioWaveProvider);
        _mixingSampleProvider.AddMixerInput(audioWaveProvider.WaveProvider);
    }

MixingWaveStream is a conversion from an IWaveProvider to a WaveStream. p.Start() simply initializes a WasapiOut at this point and calls Play(). There is only one right now (I realize the current setup will not work with multiple outputs). And my AudioWaveProvider;

    public AudioWaveProvider(IAudioProvider provider, WaveFormat format)
    {
        // Resample if necessary
        if (provider.BitDepth != format.BitsPerSample || provider.Channels != format.Channels || provider.SampleRate != format.SampleRate)
        {
            _waveProviderToSendSamples = new BufferedWaveProvider(new WaveFormat(provider.SampleRate, provider.BitDepth, provider.Channels));
            WaveProvider = new MediaFoundationResampler(_waveProviderToSendSamples, format);
        }
        else
        {
            WaveProvider = new BufferedWaveProvider(format);
            _waveProviderToSendSamples = (BufferedWaveProvider)WaveProvider;
        }

        AudioProvider = provider;
        provider.ProvideSamples += Provider_ProvideSamples;
    }

    private void Provider_ProvideSamples(IAudioProvider provider, AudioSamples samples)
    {
        _waveProviderToSendSamples.AddSamples(samples.Samples, 0, (int)samples.Samples.Length);
    }

My audio providers (in this case just a libvlc played video) provides samples through an event.

It all works however with a significant delay (about 100 ms when looking at the video frames I'm outputting). I realize that adding a mixer, BufferedWaveProvider and (potentially) resampler will add significant overhead but I'd like to know what the best practice is for keeping video and audio in sync.

Edit: My input is 44100Hz so the MediaFoundationResampler is used. After a bit of testing, this is the cause of most of the delay but I have multiple inputs with different formats.

So, how do I keep audio and video in sync? Or how do I lower the time MediaFoundationResampler takes to resample? What is the best practice here? I could use multiple outputs but it has been recommended to use a mixer instead.


Solution

  • Yes, MediaFoundationTransform has a hard-coded read size of one second of audio as that makes it easy to work out what the sizes of the source and destination buffers should be. I always meant to make this configurable in the future after experimenting with what the optimal size is, but since I was only using it in scenarios where reading ahead was possible I never got round to it.

    If you can create your own custom build of NAudio then you can try with smaller sourceBuffer sizes.