matlab performance linear-regression calibration large-data

Using low frequency data to calibrate high frequency data

I have a 10 Hz time series measured by a fast instrument and a 1 minute time series measured by a slow reference instrument. The data consists of a fluctuating meteorological parameter. The slow reference instrument is used to calibrate the fast instrument measurements. Both time series are synchronised.

My idea:

Average the 10 Hz data into 1 minute blocks.
Take 5 one minute block from each time series and calculate the linear regression equations.
Use the regression equations to calibrate the 10 Hz data in 5 minute blocks (3000 data points).

What would be the best way to match (calibrate) the high frequency data using the low frequency data? I use MATLAB.

More background: The fast instrument outputs a fluctuating voltage signal while the slow instrument outputs the true value of a trace gas concentration in ppb (parts per billion). The slow instrument samples every ten seconds and outputs the average every one minute.

In short I would like to have my fast signal also in ppb but without losing it's integrity (I need the turbulent fluctuations to remain unfiltered), hence the need to use a linear fit.

Solution

Here's my approach and the results I got...

I modelled the problem as there being

a real (unmeasured by instruments) signal. Let's call this real.
a slow signal - which is just the real signal sampled once a minute. Let's call this lf (short for low frequency).
a fast signal - real signal + noise + signal drift. Let's call this hf (short for high frequency).

The task was to take the slow and fast signals and try to reconstruct the real signal. (Using least squares as a scoring metric)

Strategy:

Define a "piecewise linear filter" - this takes a signal, and returns a piecewise version of it. (With each piecewise part occurring where the slow signal is measured.)
NOTE: The slow signal is considered piecewise anyway.
Define a forwards-backwards low pass filter.
Define "uncertainty" to be 0 at the points where the low frequency signal is measured. It linearly increases to 1 when the timestamp is halfway between low frequency signal measurements.
Now, take your high frequency signal and filter it with the low pass filter. Let's call this hf_lp
Take hf_lp and apply the "piecewise linear filter" to it. Let's call this hf_lp_pl
Subtract the last two from each other. I.e. hf_diff = hf_lp - hf_lp_pl.
You now want to find some function that estimates how by how much hf_diff should be added to the low frequency signal (lf) such that the squared error between real_estimated and real is minimized. I fitted a function along the lines of real_estimated = lf + diff.*(a1*uncertainty + a2*uncertainty.^2 + a3*uncertainty.^3)
Use fminsearch or other optimization techniques to get a1, a2, a3...

Here is a sample plot of my results - you can see that real_estimated is much closer to real than the slow signal lf.

Closing thoughts...

The fast signal contains too much very low frequency (drift) and too much very high frequency (noise) components. But it has valuable medium frequency info.
The slow signal has perfect low frequency information, but no medium frequency info.
The strategy above is really just one way of extracting the medium frequencies from the fast signal and adding it to the low frequency signal.
This way, we get the best of all worlds: low frequencies, medium frequencies and low noise.