Search code examples
iosobjective-caudiosynchronizationaudio-comparison

how to find an offset from two audio file ? one is noisy and one is clear


I have once scenario in which user capturing the concert scene with the realtime audio of the performer and at the same time device is downloading the live streaming from audio broadcaster device.later i replace the realtime noisy audio (captured while recording) with the one i have streamed and saved in my phone (good quality audio).right now i am setting the audio offset manually with trial and error basis while merging so i can sync the audio and video activity at exact position.

Now what i want to do is to automate the process of synchronisation of audio.instead of merging the video with clear audio at given offset i want to merge the video with clear audio automatically with proper sync.

for that i need to find the offset at which i should replace the noisy audio with clear audio.e.g. when user start the recording and stop the recording then i will take that sample of real time audio and compare with live streamed audio and take the exact part of that audio from that and sync at perfect time.

does any one have any idea how to find the offset by comparing two audio files and sync with the video.?


Solution

  • Here's a concise, clear answer.

    • It's not easy - it will involve signal processing and math.
    • A quick Google gives me this solution, code included.
    • There is more info on the above technique here.
    • I'd suggest gaining at least a basic understanding before you try and port this to iOS.
    • I would suggest you use the Accelerate framework on iOS for fast Fourier transforms etc
    • I don't agree with the other answer about doing it on a server - devices are plenty powerful these days. A user wouldn't mind a few seconds of processing for something seemingly magic to happen.

    Edit

    As an aside, I think it's worth taking a step back for a second. While math and fancy signal processing like this can give great results, and do some pretty magical stuff, there can be outlying cases where the algorithm falls apart (hopefully not often).

    What if, instead of getting complicated with signal processing, there's another way? After some thought, there might be. If you meet all the following conditions:

    • You are in control of the server component (audio broadcaster device)
    • The broadcaster is aware of the 'real audio' recording latency
    • The broadcaster and receiver are communicating in a way that allows accurate time synchronisation

    ...then the task of calculating audio offset becomes reasonably trivial. You could use NTP or some other more accurate time synchronisation method so that there is a global point of reference for time. Then, it is as simple as calculating the difference between audio stream time codes, where the time codes are based on the global reference time.