Search code examples
matlabstatisticsnoise

In a set of possibly noisy data, and given that I know the real data should peaks evenly spaced, how can I detect the real desired data using MATLAB?


I have a set of measured data that should theoretically store only the power peaks arriving at a receiver, and I know these peaks should come in intervals of 4 seconds (approximately at least, because in the real case scenario I should expect it to deviate a little).

The problem is that the system can also receive random data from sources other than the one I'm interested in studying or as an echo from the same source, like in the image example: Example data

In this image, the blue data is the real data, and the red data is random data that should be ignored.

What's the best way using MATLAB (and possibly some statistics knowledge) to detect those that are most probably the wanted data? (sometimes the "parasite" data can also be spaced of 4 seconds, if it's an echo)


Solution

  • The following code finds times tags that belongs to the longest series with gaps that are close to multiple of 4.
    The algorithm assumes that valid gap might be missing from the series (not searching for continuity).

    %T is the X coordinate of your graph (time tag).
    %Notice: The amplitude is irrelevant here.
    T = [1, 2, 5, 6, 7, 10, 12, 14];
    
    %Create all possible combinations of indexes of T.
    [Y, X] = meshgrid(1:length(T));
    
    %G matrix is the combinations of all gaps:
    %T(1) - T(1), T(2) - T(1), T(3) - T(1)...
    %It is inefficient to compute all gaps (even in reverse and T(1) - T(1)),
    %But it is a common way to solve problems using Matlab.
    G = T(X) - T(Y);
    
    %Ignore sign of gaps.
    G = abs(G);
    
    %Remove all gaps that are not multiple of 4 with 0.1 hysteresis.
    %Remove gaps like 5, 11, and 12.7...
    G((mod(G, 4) > 0.1) & (mod(G, 4) < 3.9)) = 0;
    
    %C is a counter vector - counts all gaps that are not zeros.
    %Now C holds the number of elements in the relevant series of each time sample.
    C = sum(G > 0, 1);
    
    %Only indexes belongs to the maximum series are valid.
    ind = (C == max(C));
    
    %Result: time tags belongs to the longest series.
    resT = T(ind)
    

    Note:
    In case you are looking for longest series without gaps you can use the following code:

    T = [1, 2, 5, 6, 7, 10, 12, 14];
    len = length(T);
    C = zeros(1, len);
    
    for i = 1:len-1
        j = i;
        k = i+1;
        while (k <= len)
            gap = T(k) - T(j);
            if (abs(gap - 4) < 0.1)
                C(i) = C(i) + 1; %Increase series counter.
    
                %Continue searching from j forward.
                j = k;
                k = j+1;
            else
                k = k+1;
            end
    
            if (gap > 4.1)
                %Break series if gap is above 4.1
                break;
            end                
        end
    end
    
    %now find(C == max(C)) is the index of the beginning of the longest contentious series.