Fourier transformation (fft) for Time Series, but both ends of cleaned data move towards each other

I have a time serie that represents the X and the Z coordinate in a virtual environment.

X = np.array(df["X"])
Z = np.array(df["Z"])

Both the X and the Z coordinates contain noise from a different source. To filter out the noise I'd like to use a Fourier Transform. After some research I used the code from https://medium.com/swlh/5-tips-for-working-with-time-series-in-python-d889109e676d to denoise my data.

def fft_denoiser(x, n_components, to_real=True):
    n = len(x)

    # compute the fft
    fft = np.fft.fft(x, n)

    # compute power spectrum density
    # squared magnitud of each fft coefficient
    PSD = fft * np.conj(fft) / n

    # keep high frequencies
    _mask = PSD > n_components
    fft = _mask * fft

    # inverse fourier transform
    clean_data = np.fft.ifft(fft)

    if to_real:
        clean_data = clean_data.real

    return clean_data

After setting the n_components I like to use the cleaned data. Which goes pretty well, as I plotted for the X-coordinate:

Only at the beginning and the end, the cleaned data suddenly move towards each others values... Can somebody help or expain to me what causes this, and how I can overcome this?

Solution

The reason you're having this issue is because the FFT implicitly assumes that the provided input signal is periodic. If you repeat your raw data, you see that at each period there is a large discontinuity (as the signal goes from ~20 back down to ~5). Once some higher frequency components are removed you see the slightly less sharp discontinuity at the edges (a few samples at the start and a few samples at the end).

To avoid this situation, you can do the filtering in the time-domain using a linear FIR filter, which can process the data sequence without the periodicity assumption.

For the purpose of this answer I constructed a synthetic test signal (which you can use to recreate the same conditions), but you can obviously use your own data instead:

# Generate synthetic signal for testing purposes
fs = 1 # Hz
f0 = 0.002/fs
f1 = 0.01/fs
dt = 1/fs
t = np.arange(200, 901)*dt
m = (25-5)/(t[-1]-t[0])
phi = 4.2
x = 5 + m*(t-t[0]) + 2*np.sin(2*np.pi*f0*t) + 1*np.sin(2*np.pi*f1*t+phi) + 0.2*np.random.randn(len(t))

Now to design the filter we can take the inverse transform of the _mask (instead of applying the mask):

import numpy as np

# Design denoising filter
def freq_sampling_filter(x, threshold):
  n = len(x)

  # compute the fft
  fft = np.fft.fft(x, n)

  # compute power spectrum density
  # squared magnitud of each fft coefficient
  PSD = fft * np.conj(fft) / n

  # keep frequencies with large contributions
  _mask = PSD > threshold
  _coff = np.fft.fftshift(np.real(np.fft.ifft(_mask)))
  return _coff

coff = freq_sampling_filter(x, threshold)

The threshold is a tunable parameter which would be chosen to keep enough of the frequency components you'd like to keep and get rid of the unwanted frequency components. That is of course highly subjective.

Then we can simply apply the filter with scipy.signal.filtfilt:

from scipy.signal import filtfilt

# apply the denoising filter
cleaned = filtfilt(coff, 1, x, padlen=len(x)-1, padtype='constant')

For the purpose of illustration, using a threshold of 10 with the above generated synthetic signal yields the following raw data (variable x) and cleaned data (variable cleaned):

The choice of padtype to 'constant' ensures that the filtered values start and end at the start and end values of the unfiltered data.

Alternative

As was posted in comments, filtfilt may be expensive for longer data sets. As an alternative, the filtering can be performed using FFT-based convolution by using scipy.fftconvolve. Note that in this case, there is no equivalent of the padtype argument of filtfilt, so we need to manually pad the signal to avoid edge effects at the start and end.

n = len(x)
# Manually pad signal to avoid edge effects
x_padded = np.concatenate((x[0]*np.ones(n-1), x, x[-1]*np.ones((n-1)//2)))
# Filter using FFT-based convolution
cleaned = fftconvolve(x_padded, coff, mode='same')
# Extract result (remove data from padding)
cleaned = cleaned[2*(n-1)//2:-n//2+1]

For reference, here are some benchmark comparisons (timings in seconds, so smaller is better) for the above signal of length 700:

filtfilt    : 0.3831593
fftconvolve : 0.00028040000000029153

Note that relative performance would vary, but FFT-based convolution is expected to perform comparatively better as the signal gets longer.