Search code examples
pythonscipysignal-processingsmoothing

What is the most efficient way to filter (smooth) continuous streaming data


I am in the process of making my own system monitoring tool. I'm looking to run a filter (like a Gaussian filter or similar) on a continuous stream of raw data that i'm receiving from a device (My cpu % in this case).

The collection of data values is n elements long. Every time this piece of code runs it appends the new cpu value and removes the oldest keeping the collection at a length of n essentially a deque([float('nan')] * n, maxlen=n) where n is the length of the graph i'm plotting to.

then it filters the whole collection through a Gaussian filter creating the smoothed data points and then plots them, creating an animated graph similar to most system monitors cpu % graphs found on your computer.

This works just fine... However there has to be a more efficient way to filter the incoming data instead of running a filter on the whole data set every time a new data val is added (in my case the graph updates every .2 sec)

I can think of ways to do it without filtering the whole list but im not sure they are very efficient. Is there anything out there in the signal processing world that will work for me? Apologies if my explanation is a bit confusing, I'm very new to this.

from scipy.ndimage.filters import gaussian_filter1d

# Not my actual code but hopefully describes what im doing
def animate():  # function that is called every couple of milliseconds to animate the graph
    # ... other stuff
    values.append(get_new_val) # values = collection of data vals from cpu
    line.set_ydata(gaussian_filter1d(values, sigma=4)) # line = the line object used for graphing 
    # ... other stuff
    graph_line(line)  # function that graphs the line

tl;dr: looking for an optimized way to smooth raw streaming data instead of filtering the whole data set every pass.


Solution

  • I've never used one, but what you need like sounds what a Savitzky–Golay filter is for. It is a local smoothing filter that can be used to make data more differentiable (and to differentiate it, while we're at it).

    The good news is that scipy supports this filter as of version 0.14. The relevant part of the documentation:

    scipy.signal.savgol_filter(x, window_length, polyorder, deriv=0, delta=1.0, axis=-1, mode='interp', cval=0.0)
    
      Apply a Savitzky-Golay filter to an array.
      This is a 1-d filter. If x has dimension greater than 1, axis determines the axis along which the filter is applied.
      Parameters:   
    
      x : array_like
          The data to be filtered. If x is not a single or double precision floating point array, it will be converted to type numpy.float64 before ftering.
      window_length : int
          The length of the filter window (i.e. the number of coefficients). window_length must be a positive odd integer.
      polyorder : int
          The order of the polynomial used to fit the samples. polyorder must be less than window_length.
      [...]
    

    I would first determine a small pair of polynomial order and window size. Instead of working with the full n data points, you only need to smooth a much smaller deque of a length of roughly window_length. As each new data point comes in, you have to append it to your smaller deque, apply the Savitzky–Golay filter, take the new filtered point, and append it to your graph.

    Note, however, that it seems to me that the method is mostly well-defined when not on the edge of the data set. This might mean that for precision's sake you might have to introduce a few measurements' worth of delay, so that you can always use points which are inside a given window (what I mean is that for a given time point you likely need "future" data points to get a reliable filtered value). Considering that your data is measured five times every second, this might be a reasonable compromise if necessary.