Search code examples
algorithmaudiosignal-processingsample-rate

How can I detect these audio abnormalities?


iOS has an issue recording through some USB audio devices. It cannot be reliably reproduced (happens every 1 in ~2000-3000 records in batches and silently disappears), and we currently manually check our audio for any recording issues. It results in small numbers of samples (1-20) being shifted by a small number that sounds like a sort of 'crackle'.

They look like this:

Waveform with abnormality

closer:

enter image description here

closer:

enter image description here

another, single sample error elsewhere in the same audio file:

enter image description here

The question is, how can these be algorithmically be detected (assuming direct access to samples) whilst not triggering false positives on high frequency audio with waveforms like this:

enter image description here

Bonus points: after determining as many errors as possible, how can the audio be 'fixed'?

More bonus points: what could be causing this issue in the iOS USB audio drivers/hardware (assuming it is there).


Solution

  • I do not think there is an out of the box solution to find the disturbances, but here is one (non standard) way of tackling the problem. Using this, I could find most intervals and I only got a small number of false positives, but the algorithm could certainly use some fine tuning.

    My idea is to find the start and end point of the deviating samples. The first step should be to make these points stand out more clearly. This can be done by taking the logarithm of the data and taking the differences between consecutive values.

    In MATLAB I load the data (in this example I use dirty-sample-other.wav)

    y1 = wavread('dirty-sample-pictured.wav');
    y2 = wavread('dirty-sample-other.wav');
    y3 = wavread('clean-highfreq.wav');
    
    data = y2;
    

    and use the following code:

    logdata = log(1+data);
    difflogdata = diff(logdata);
    

    So instead of this plot of the original data:

    original data

    we get:

    diff-log-data

    where the intervals we are looking for stand out as a positive and negative spike. For example zooming in on the largest positive value in the plot of logarithm differences we get the following two figures. One for the original data:

    Original data zoomed

    and one for the difference of logarithms:

    Diff-log-data zoomed

    This plot could help with finding the areas manually but ideally we want to find them using an algorithm. The way I did this was to take a moving window of size 6, computing the mean value of the window (of all points except the minimum value), and compare this to the maximum value. If the maximum point is the only point that is above the mean value and at least twice as large as the mean it is counted as a positive extreme value.

    I then used a threshold of counts, at least half of the windows moving over the value should detect it as an extreme value in order for it to be accepted.

    Multiplying all points with (-1) this algorithm is then run again to detect the minimum values.

    Marking the positive extremes with "o" and negative extremes with "*" we get the following two plots. One for the differences of logarithms:

    diff-log-data with found extremes

    and one for the original data:

    original data with found extremes

    Zooming in on the left part of the figure showing the logarithmic differences we can see that most extreme values are found:

    diff-log-data with found extremes zoomed

    It seems like most intervals are found and there are only a small number of false positives. For example running the algorithm on 'clean-highfreq.wav' I only find one positive and one negative extreme value.

    Single values that are falsely classified as extreme values could perhaps be weeded out by matching start and end-points. And if you want to replace the lost data you could use some kind of interpolation using the surrounding data-points, perhaps even a linear interpolation will be good enough.

    Here is the MATLAB-code I used:

    function test20()
    clc
    clear all
    
    y1 = wavread('dirty-sample-pictured.wav');
    y2 = wavread('dirty-sample-other.wav');
    y3 = wavread('clean-highfreq.wav');
    
    data = y2;
    
    logdata = log(1+data);
    difflogdata = diff(logdata);
    
    figure,plot(data),hold on,plot(data,'.')
    figure,plot(difflogdata),hold on,plot(difflogdata,'.')
    
    figure,plot(data),hold on,plot(data,'.'),xlim([68000,68200])
    figure,plot(difflogdata),hold on,plot(difflogdata,'.'),xlim([68000,68200])
    
    k = 6;
    myData = difflogdata;
    myPoints = findPoints(myData,k);
    
    myData2 = -difflogdata;
    myPoints2 = findPoints(myData2,k);
    
    figure
    plotterFunction(difflogdata,myPoints>=k,'or')
    hold on
    plotterFunction(difflogdata,myPoints2>=k,'*r')
    
    figure
    plotterFunction(data,myPoints>=k,'or')
    hold on
    plotterFunction(data,myPoints2>=k,'*r')
    
    end
    
    function myPoints = findPoints(myData,k)
    
    iterationVector = k+1:length(myData);
    myPoints = zeros(size(myData));
    for i = iterationVector
        subVector = myData(i-k:i);
        meanSubVector = mean(subVector(subVector>min(subVector)));
        [maxSubVector, maxIndex] = max(subVector);
        if (sum(subVector>meanSubVector) == 1 && maxSubVector>2*meanSubVector)
            myPoints(i-k-1+maxIndex) = myPoints(i-k-1+maxIndex) +1;
        end
    end
    
    end
    
    function plotterFunction(allPoints,extremeIndices,markerType)
    
    extremePoints = NaN(size(allPoints));
    extremePoints(extremeIndices) = allPoints(extremeIndices);
    plot(extremePoints,markerType,'MarkerSize',15),
    hold on
    plot(allPoints,'.')
    plot(allPoints)
    
    end
    

    Edit - comments on recovering the original data

    Here is a slightly zoomed out view of figure three above: (the disturbance is between 6.8 and 6.82)

    Original data zommed out

    When I examine the values, your theory about the data being mirrored to negative values does not seem to fit the pattern exactly. But in any case, my thought about just removing the differences is certainly not correct. Since the surrounding points do not seem to be altered by the disturbance, I would probably go back to the original idea of not trusting the points within the affected region and instead using some sort of interpolation using the surrounding data. It seems like a simple linear interpolation would be a quite good approximation in most cases.