I have a time serie that represents the X and the Z coordinate in a virtual environment.
X = np.array(df["X"])
Z = np.array(df["Z"])
Both the X and the Z coordinates contain noise from a different source. To filter out the noise I'd like to use a Fourier Transform. After some research I used the code from https://medium.com/swlh/5-tips-for-working-with-time-series-in-python-d889109e676d to denoise my data.
def fft_denoiser(x, n_components, to_real=True):
n = len(x)
# compute the fft
fft = np.fft.fft(x, n)
# compute power spectrum density
# squared magnitud of each fft coefficient
PSD = fft * np.conj(fft) / n
# keep high frequencies
_mask = PSD > n_components
fft = _mask * fft
# inverse fourier transform
clean_data = np.fft.ifft(fft)
if to_real:
clean_data = clean_data.real
return clean_data
After setting the n_components I like to use the cleaned data. Which goes pretty well, as I plotted for the X-coordinate:
Only at the beginning and the end, the cleaned data suddenly move towards each others values... Can somebody help or expain to me what causes this, and how I can overcome this?
The reason you're having this issue is because the FFT implicitly assumes that the provided input signal is periodic. If you repeat your raw data, you see that at each period there is a large discontinuity (as the signal goes from ~20 back down to ~5). Once some higher frequency components are removed you see the slightly less sharp discontinuity at the edges (a few samples at the start and a few samples at the end).
To avoid this situation, you can do the filtering in the time-domain using a linear FIR filter, which can process the data sequence without the periodicity assumption.
For the purpose of this answer I constructed a synthetic test signal (which you can use to recreate the same conditions), but you can obviously use your own data instead:
# Generate synthetic signal for testing purposes
fs = 1 # Hz
f0 = 0.002/fs
f1 = 0.01/fs
dt = 1/fs
t = np.arange(200, 901)*dt
m = (25-5)/(t[-1]-t[0])
phi = 4.2
x = 5 + m*(t-t[0]) + 2*np.sin(2*np.pi*f0*t) + 1*np.sin(2*np.pi*f1*t+phi) + 0.2*np.random.randn(len(t))
Now to design the filter we can take the inverse transform of the _mask
(instead of applying the mask):
import numpy as np
# Design denoising filter
def freq_sampling_filter(x, threshold):
n = len(x)
# compute the fft
fft = np.fft.fft(x, n)
# compute power spectrum density
# squared magnitud of each fft coefficient
PSD = fft * np.conj(fft) / n
# keep frequencies with large contributions
_mask = PSD > threshold
_coff = np.fft.fftshift(np.real(np.fft.ifft(_mask)))
return _coff
coff = freq_sampling_filter(x, threshold)
The threshold
is a tunable parameter which would be chosen to keep enough of the frequency components you'd like to keep and get rid of the unwanted frequency components. That is of course highly subjective.
Then we can simply apply the filter with scipy.signal.filtfilt
:
from scipy.signal import filtfilt
# apply the denoising filter
cleaned = filtfilt(coff, 1, x, padlen=len(x)-1, padtype='constant')
For the purpose of illustration, using a threshold
of 10 with the above generated synthetic signal yields the following raw data (variable x
) and cleaned data (variable cleaned
):
The choice of padtype
to 'constant'
ensures that the filtered values start and end at the start and end values of the unfiltered data.
Alternative
As was posted in comments, filtfilt
may be expensive for longer data sets.
As an alternative, the filtering can be performed using FFT-based convolution by using scipy.fftconvolve
. Note that in this case, there is no equivalent of the padtype
argument of filtfilt
, so we need to manually pad the signal to avoid edge effects at the start and end.
n = len(x)
# Manually pad signal to avoid edge effects
x_padded = np.concatenate((x[0]*np.ones(n-1), x, x[-1]*np.ones((n-1)//2)))
# Filter using FFT-based convolution
cleaned = fftconvolve(x_padded, coff, mode='same')
# Extract result (remove data from padding)
cleaned = cleaned[2*(n-1)//2:-n//2+1]
For reference, here are some benchmark comparisons (timings in seconds, so smaller is better) for the above signal of length 700:
filtfilt : 0.3831593
fftconvolve : 0.00028040000000029153
Note that relative performance would vary, but FFT-based convolution is expected to perform comparatively better as the signal gets longer.