Search code examples
pythonpandasfilteringsmoothingkalman-filter

How to generate Estimates from a set of measurement using Smoothing and Filtering Techniques


I am relatively new to smoothing/filtering sensor data/calculated values.

I would like to generate something like this curve below with a pandas dataframe of measurements over a given time axis.

My data is something like this:

charge_cycle  cumulative_chargetime_Ah  calculated_res
        
1   0.002199    0.075790
2   0.003123    0.071475
3   0.007699    0.097084
4   0.012086    0.050456
5   0.016609    0.077575
... ... ...
123169  478.228427  0.110583
123170  478.236834  0.139948
123171  478.239822  0.121189
123172  478.242608  0.144464
123173  478.251933  0.115232

And the output I want to get is something like below. The blue noisy calculated_res like variable is what I have currently, and it is evidently very noisy, and I would need to do some form of filtering on this to generate a more usable variable. The red plot, I know I can use intrapolation to generate or by fitting a 1D Polynomial on it.

However I am really unsure how to generate the estimate, which is the blue thick scatter plot overlaying the raw data graph. Could I get some advise on how to get this "estimated value"?

I think it does have something to do with Filtering, but I am unsure of how to apply that to this use case.

Graph of Estimate Values


Solution

  • The first thing I'd try is some polynomial fits. It looks as though there are at least 10 turning points so to reproduce those you'd need at least an 11 degree polynomial, but why stint? I think I'd try fitting 12, 24, 36.. degree polynomials and see how they looked.

    Getting a bit fancier, you might want to read up on the Wiener Filter

    I don't think a Kalman filter is all that appropriate. For one thing you'd definitely want to do Kalman smoothing as well. A smoother (that has access to all the data) is always going to outperform a filter (which gets the data sequentially). But the real problem with a Kalman filter is that you need to specify a dynamic model, that is how the 'state' at the next time depends on the state at the previous time, and how well this model fits the signal is crucial in determining how well the filter will perform. More onerously some of the parameters for this dynamic model are stochastic in nature -- for example how the uncertainties in the states change during a prediction step -- and these are not only important in determining the filter/smoother performance but are difficult to determine.