Search code examples
c#fftspectrumaudio-fingerprinting

C# audio-fingerprintting in small wavs


I need to find in a similar wav file in a small database of around 40 files with lengths from 5 and 7 seconds.

These wav files are the records that the telephone service provider gives your when you make a call.

Example:

https://clyp.it/lnz1aybd

My needle is 1 or 2 seconds long.

all the wavs are pcm encoded 16 bits at 8000hz mono.

I tried using Aurio.AudioFingerPrint without success

https://github.com/protyposis/Aurio

// Setup the sources
var audioTrack1 = new AudioTrack(new FileInfo("Full5secs.wav"));
var audioTrack2 = new AudioTrack(new FileInfo("Part2Secs.wav"));

// Setup the fingerprint generator
var defaultProfile = FingerprintGenerator.GetProfiles()[0];
var generator = new FingerprintGenerator(defaultProfile);

// Create a fingerprint store
var store = new FingerprintStore(defaultProfile);

// Setup the generator event listener (a subfingerprint is a hash with its temporal index)
generator.SubFingerprintsGenerated += (sender, e) => {
    var progress = (double)e.Index / e.Indices;
    var hashes = e.SubFingerprints.Select(sfp => sfp.Hash);
    store.Add(e);
};

// Generate fingerprints for both tracks
generator.Generate(audioTrack1);
generator.Generate(audioTrack2);

// Check if tracks match
if (store.FindAllMatches().Count > 0) {
   Console.WriteLine("overlap detected!");
}

What's wrong with my approach?
Anyone knows the configuration I'm missing for small wavs?


Solution

  • Might be too late, but I'm the author of Aurio and can help you with that. I'm assuming that you're using the FingerprintGenerator from the Aurio.Matching.HaitsmaKalker2002 namespace, but it will be similar with other fingerprinting methods from other namespaces too.

    Your problem is that a fingerprint with the default configuration needs about 3 seconds of audio, that means a 2 seconds long audio file will not yield a fingerprint and therefore you cannot get a match.

    By default, a fingerprint of the HaitsmaKalker2002 method consists of 256 sub-fingerprints. This length is configured in the FingerprintStore where the fingerprint matching takes place. Sub-fingerprints are calculated from slices (windows) taken from a down-sampled audio stream with a sampling rate of 5512. The window is 2048 samples long and taken every 64 samples. These values are set in a profile that is used to configure the FingerprintGenerator that extracts the fingerprints. You can find the mentioned values in the DefaultProfile. With this configuration, you need at least 1 / 5512 * (255 * 64 + 2048) =~ 3.4 seconds of audio to yield a fingerprint. Every following fingerprint will only need 64 more audio samples, so with a 4 second audio you already get 313 fingerprints and the chance of matching is much higher.

    In your case, you need to shorten the required audio length of a fingerprint, and you can do that by creating a custom profile for the FingerprintGenerator (extend the DefaultProfile or adjust config values) or adjusting settings of the matching stage in the FingerprintStore. To cut the minimum audio time in half, you can e.g. double the SampleRate or half the FrameStep of the DefaultProfile, or halve the fingerprint length, or make a combination of all of these possibilities.

    // Setup the fingerprint generator
    var defaultProfile = FingerprintGenerator.GetProfiles()[0];
    defaultProfile.SampleRate = 11025; // Adjust the profile
    var generator = new FingerprintGenerator(defaultProfile);
    
    // Create a fingerprint store
    var store = new FingerprintStore(defaultProfile);
    // Set the fingerprint length to 128 instead of the default 256
    store.FingerprintSize = 128;
    

    Another method might be lengthening the input audio by padding with silence, but then you might have to raise the store.Threshold to allow a higher error margin (because the actual audio payload is too short and will never fully match anywhere. You'd have to do the padding externally though, because this use-case is currently not possible through Aurio's API.

    Please keep in mind that the default values have been chosen because they lead to good results. Changing them without knowing what you're doing might lead to lots of false positives or misses, but since your input files are very short you'll have to give it a try. I recommend trying AudioAlign which is basically a GUI around Aurio, where you can add your two test files and experiment with the FingerprintSize and Threshold values very easily, and it will even graphically show you matches in the audio files and you can directly listen to them.