Trouble filtering potentiometer data (noisy, high spikes)

I've collected some data from a potentiometer using an Arduino microcontroller. Here is the data which was sampled at 500 Hz (it's a lot of data):

http://cl.ly/3D3s1U3m1R1T?_ga=1.178935463.2093327149.1426657579

If you zoom in you can see that essentially I have a pot that just rotates back and forth i.e., I should see a linear increase and then a linear decrease. While the general shape of the data affirms this, almost every single time there's some really freaking annoying (sometimes surprisingly wide) spikes that get in the way of a really nice shape. Is there any way I can make some type of algorithm or filter which fixes this? I tried a median filter, and using the percentiles but neither worked. I mean I feel like it shouldn't be the hardest thing because I can clearly see what it should look like-basically the minimum of where the spikes occur-but for some reason everything I try miserably fails or at least looses the integrity of the original data.

I'd appreciate any help I can get with this.

Solution

There are many ways to tackle your problem. However none of them will ever be perfect. I'll give you 2 approaches here.

Moving average (low pass filter)

In Matlab, one easy way to "low pass" filter your data without having to explicitly use FFT, is to use the filter function` (available in the base package, you do not need any specific toolbox).

You create a kernel for the filter, and apply it twice (once in each direction), to cancel the phase shift introduced. This is in effect a "Moving Average" filter with zero phase shift. The size (length) of the kernel will control how heavy the averaging process will be.

So for example, 2 different filter lengths:

n = 100 ; %// length of the filter
kernel = ones(1,n)./n ;
q1 = filter( kernel , 1 , fliplr(p) ) ;  %// apply the filter in one direction
q1 = filter( kernel , 1 , fliplr(q1) ) ; %// re-apply in the opposite direction to cancel phase shift

n = 500 ; %// length of the filter
kernel = ones(1,n)./n ;
q2 = filter( kernel , 1 , fliplr(filter( kernel , 1 , fliplr(p) )) ) ; %// same than above in one line

Will produce on your data: denoising1

As you can see, each filter size has its pros and cons. The more you filter, the more of your spikes you cancel, but the more you deform your original signal. It is up to you to find your optimum settings.

2) Search derivative anomalies

This is a different approach. You can observe on your signal that the spikes are mostly sudden, it means the change of value of your signal is rapid, and luckily faster than the "normal" rate of change of your desired signal. It means that you can calculate the derivative of your signal, and identify all the spikes (the derivative will be much higher than for the rest of the curve).
Since this only identify the "beginning" and "end" of the spikes (not the occasional plateau in the middle), we will need to extend a little bit the zone identified as faulty by this method.
When the identification of faulty data is done, you just discard these data points and re-interpolate your curve over the original interval (taking support on the points you have left).

%% // Method 2 - Reinterpolation of cancelled data

%// OPTIONAL slightly smooth the initial data to get a cleaner derivative
n = 10 ; kernel = ones(1,n)./n ;
ps = filter( kernel , 1 , fliplr(filter( kernel , 1 , fliplr(p) )) ) ;

%// Identify the derivative anomalies (too high or too low)
dp = [0 diff(ps)] ;             %// calculate the simplest form of derivative (just the difference between consecutive points)
dpos = dp >= (std(dp)/2) ;      %// identify positive derivative above a certain threshold (I choose the STD but you could choose something else)
dneg = dp <= -(std(dp)/2) ;     %// identify negative derivative above the threshold
ixbad = dpos | dneg ;           %// prepare a global vector of indices to cancel

%// This will cancel "nPtsOut" on the RIGHT of each POSITIVE derivative
%// point identified, and "nPtsOut" on the LEFT of each NEGATIVE derivative
nPtsOut = 100 ;                 %// decide how many points after/before spikes we are going to cancel
for ii=1:nPtsOut
    ixbad = ixbad | circshift( dpos , [0 ii]) | circshift( dneg , [0 -ii]) ;
end

%// now we just reinterpolate the missing gaps
xp = 1:length(p) ;                              %// prepare a base for reinterpolation
pi = interp1( xp(~ixbad) , p(~ixbad) , xp )  ;  %// do the reinterpolation

This will produce:
denoising1

The red signal is the result of the above moving average, the green signal is the result of the derivative approach.
There are also settings you can change to adjust this result (the threshold for the derivative, 'nPtsOut' and even the initial smoothing of the data).

As you can see, for the same amount of spike cancellation than the moving average method, it respect a bit more the integrity of the initial data. However it is not perfect either and some interval will still be deformed. But as I said at the beginning, no method is ever perfect.