zero padded FFT using FFTW

To interpolate a signal in frequency domain, one can pad zeros in time domain and do an FFT.

Suppose the number of elements in a given vector X is N and Y is the same as X but padded one sided with N zeros. Then the following give the same result.

$$\hat{x}(k)=\sum_{n=0}^{2N-1} Y(n)e^{i2\pi k n/2N},\quad k=0,...,2N-1,$$
$$\hat{x}(k)=\sum_{n=0}^{ N-1} X(n)e^{i2\pi k n/2N},\quad k=0,...,2N-1.$$

Now if we use FFTW package, the first equation needs 2N memory space for the input vector while the second one needs only N memory space (I do not know if it is even possible to do in the existing FFTW package)! Also the computational complexity lowers from 2N^2log(2N) to 2N^2log(N). The problem is worse whenever we do a 2D FFT or 3D FFT. Is it possible to do the second approach using FFTW package? This is fairly easy to do in MATLAB though.

Solution

If x is a 2N signal padded with zeros above N , its DFT writes :

$\hat{x}(k)=\sum_{n=0}^{N-1}x(n)e^{\frac{2i\pi kn}{2N}}$

If k is even :

$eq2$

Hence, the coefficients of even frequencies arise from the N-point discrete Fourier transform of x(n).

if k is odd :

$eq3$

Hence, the coefficients of odd frequencies arise from the N-point discrete Fourier transform of x(n)exp(i*M_PI*n/N).

Thus, the discrete Fourier transform of a zero-padded 2N signal resumes to two DFT of signals of length N and fftw can be used to compute them.

The overall computation time will be 2*c*N*ln(N), where c is a constant. It is expected to be faster than the direct computation of the DFT c*2*N*ln(2*N). Remember that ln(2*N)=ln(2)+ln(N) : as N gets large, the extra work in case of direct computation is negligible compared to ln(N) : the trick becomes useless, even if the dimension is larger than one. It does not affect complexity.

Moreover, FFTW is really efficient, using lots of features of your PC if it is correctly installed, and it will be hard to do better than this in any case, even if the presented trick is used. Finally, if the input signal is real, you may use fftw_plan fftw_plan_dft_r2c_2d : only half the coefficients in the Fourier space are computed and stored.

Regarding memory requirements, if you are really short of memory, you can use the FFTW_IN_PLACE flag and use the same array for input and output. Yet, it is slightly slower.

The procedure presented above can be extended to compute the DFT of a LN signal of a N-point signal padded with (L-1)N zeros : it resumes to the computation of L DFTs of length N.

Do you have any reference showing how MATLAB handles and optimizes the DFT of padded signals compared to FFTW ?

EDIT : Further research about the 3D case :

The 3D DFT of a padded 3D signal x(n,m,p) is :

$eq4$

If k_n, k_m and k_p are even :

$eq5$

If k_n and k_m are even and k_p is odd :

$eq6$

...There are 8 cases.

So, the computation of the 3d dft of a 3D x of size NxNxN padded to 2Nx2Nx2N resumes to the computation of 8 3d dft of size NxNxN. Size a 3d dft is a combination of 3 1d dft, the total number of dft of size N is 3x8xNxN while the direct computation requires 3x(2N)*(2N) dft of size 2N. Computational time is 24cN^3ln(N) against 24cN^3ln(2N) : a small gain is possible...Again fftw is fast...

Yet, instead of using a black-box 3d fft, let's compute the 8 dfts of size N at once, by performing the 1d dfts in each direction.

1d dft along N : 2 cases, NxN dfts => 2cN^3ln(N)
1d dft along M : 2 cases, 2NxN dfts => 4cN^3ln(N)
1d dft along P : 2 cases, 2Nx2N dfts => 8cN^3ln(N)

Hence, the total computation time is expected to be 14cN^3ln(N) against 24cN^3ln(2N) : a small gain is possible...Again fftw is fast...

Moreover, the computation of

$eq7$

requires only a single call to exp : first compute w=exp(I*M_PI/N) then update wn=wn*w; x(n)=x(n)*wn or use pow if precision becomes an issue.