I have a 3D array which has a dimension (Nx, Ny, Nz).
I want to apply real FFT and IFFT along z-axis using FFTW3 library.
Here, 'z' is the fastest varying index.
I already write the same functional code with python using
numpy.fft.rfft and numpy.fft.irfft. It works exactly as I expected.
But it was too slow. So I tried to write code with C language.
I tried to compare the result of IFFT(FFT(f)) and f, where f is an arbitrary array.
I used the fft_plan_many_dft_r2c / fft_plan_many_dft_c2r for forward / backward FFT.
Here is my code.
(Compiled in Ubuntu 16.04 with gcc and -lfftw3 -lm options)
#include <stdio.h>
#include <stdlib.h>
#include <fftw3.h>
void rfft_1d_of_3d(int Nx, int Ny, int Nz, double* input, double* output);
int main(){
double* input;
double* output;
int Nx=2, Ny=4, Nz=8;
int i,j,k,idx;
input = (double*) malloc(Nx*Ny*Nz*sizeof(double));
output = (double*) malloc(Nx*Ny*Nz*sizeof(double));
for(i=0; i<Nx; i++){
for(j=0; j<Ny; j++){
for(k=0; k<Nz; k++){
idx = k + j*Nz + i*Nz*Ny;
input[idx] = idx;
output[idx] = 0.;
}
}
}
rfft_1d_of_3d(Nx, Ny, Nz, input, output);
for(i=0; i<Nx; i++){
for(j=0; j<Ny; j++){
for(k=0; k<Nz; k++){
idx = k + j*Nz + i*Nz*Ny;
printf("%f, %f\n", input[idx], output[idx]);
}
}
}
return 0;
}
void rfft_1d_of_3d(int Nx, int Ny, int Nz, double* input, double* output){
int i,j,k,idx;
// Allocate memory space.
fftw_complex *FFTz = fftw_alloc_complex(Nx*Ny*(Nz/2+1));
// Set forward FFTz parameters
int rankz = 1;
int nz[] = {Nz};
const int *inembedz = NULL, *onembedz = NULL;
int istridez = 1, ostridez = 1;
int idistz = Nz, odistz= (Nz+2)/2;
int howmanyz = (Nx*Ny);
// Setup Forward plans.
fftw_plan FFTz_for_plan = fftw_plan_many_dft_r2c(rankz, nz, howmanyz, input, inembedz, istridez, idistz, FFTz, onembedz, ostridez, odistz, FFTW_ESTIMATE);
// Set backward FFTz parameters
int rankbz = 1;
int nbz[] = {(Nz+2)/2};
const int *inembedbz = NULL, *onembedbz = NULL;
int istridebz = 1, ostridebz = 1;
int idistbz = (Nz+2)/2, odistbz = Nz;
int howmanybz = (Nx*Ny);
// Setup Backward plans.
fftw_plan FFTz_bak_plan = fftw_plan_many_dft_c2r(rankbz, nbz, howmanybz, FFTz, inembedbz, istridebz, idistbz, output, onembedbz, ostridebz, odistbz, FFTW_ESTIMATE);
fftw_execute(FFTz_for_plan);
fftw_execute(FFTz_bak_plan);
fftw_free(FFTz);
return;
}
the input and output result is supposed to be the same, but it does not.
The input array was a 3D array (Nx=2, Ny=4, Nz=8),
[[[ 0. 1. 2. 3. 4. 5. 6. 7.]
[ 8. 9. 10. 11. 12. 13. 14. 15.]
[ 16. 17. 18. 19. 20. 21. 22. 23.]
[ 24. 25. 26. 27. 28. 29. 30. 31.]]
[[ 32. 33. 34. 35. 36. 37. 38. 39.]
[ 40. 41. 42. 43. 44. 45. 46. 47.]
[ 48. 49. 50. 51. 52. 53. 54. 55.]
[ 56. 57. 58. 59. 60. 61. 62. 63.]]]
and I flattened it as,
[ 0. 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20. 21. 22. 23. 24. 25. 26. 27. 28. 29. 30. 31. 32. 33. 34. 35. 36. 37. 38. 39. 40. 41. 42. 43. 44. 45. 46. 47. 48. 49. 50. 51. 52. 53. 54. 55. 56. 57. 58. 59. 60. 61. 62. 63.]
I expected the output would be the same as the input but the actual result was
0.000000, 12.000000
1.000000, 8.929290
2.000000, 28.256139
3.000000, 35.743861
4.000000, 55.070710
5.000000, 0.000000
6.000000, 0.000000
7.000000, 0.000000
8.000000, 76.000000
9.000000, 72.929290
10.000000, 92.256139
11.000000, 99.743861
12.000000, 119.070710
13.000000, 0.000000
14.000000, 0.000000
15.000000, 0.000000
16.000000, 140.000000
17.000000, 136.929290
18.000000, 156.256139
19.000000, 163.743861
20.000000, 183.070710
21.000000, 0.000000
22.000000, 0.000000
23.000000, 0.000000
24.000000, 204.000000
25.000000, 200.929290
26.000000, 220.256139
27.000000, 227.743861
28.000000, 247.070710
29.000000, 0.000000
30.000000, 0.000000
31.000000, 0.000000
32.000000, 268.000000
33.000000, 264.929290
34.000000, 284.256139
35.000000, 291.743861
36.000000, 311.070710
37.000000, 0.000000
38.000000, 0.000000
39.000000, 0.000000
40.000000, 332.000000
41.000000, 328.929290
42.000000, 348.256139
43.000000, 355.743861
44.000000, 375.070710
45.000000, 0.000000
46.000000, 0.000000
47.000000, 0.000000
48.000000, 396.000000
49.000000, 392.929290
50.000000, 412.256139
51.000000, 419.743861
52.000000, 439.070710
53.000000, 0.000000
54.000000, 0.000000
55.000000, 0.000000
56.000000, 460.000000
57.000000, 456.929290
58.000000, 476.256139
59.000000, 483.743861
60.000000, 503.070710
61.000000, 0.000000
62.000000, 0.000000
63.000000, 0.000000
The left one is the elements of the input array and the right one is that of the output.
Where did I make a mistake?
For fftw_plan_many_dft_c2r()
: nbz
, the size of the transform, is to be set to Nz
, the size of the real output array. It is likely similar for fftw_plan_dft_c2r_1d()
.
// Set backward FFTz parameters
int rankbz = 1;
int nbz[1] = {(Nz)}; // here !!!
const int *inembedbz = NULL, *onembedbz = NULL;
//int inembedbz[1]={Nz/2+1};
//int onembedbz[1]={Nz};
int istridebz = 1, ostridebz = 1;
int idistbz = (Nz/2+1), odistbz = (Nz);
int howmanybz = (Nx*Ny);
// Setup Backward plans.
fftw_plan FFTz_bak_plan = fftw_plan_many_dft_c2r(rankbz, nbz, howmanybz, FFTz, inembedbz, istridebz, idistbz, output, onembedbz, ostridebz, odistbz, FFTW_ESTIMATE);
As the input
and output
array are used by FFTW transforms, it is advised to allocate them using fftw_malloc()
or a similar function, as you did for FFTz
. These functions ensure that requirements regarding memory alignment are met. See function *X(kernel_malloc)(size_t n)
in fftw-3.3.6-pl2/kernel/kalloc.c . It calls functions like memalign()
or _aligned_malloc()
among others. Both these two return NULL
just like malloc()
in case of failure.
There is a difference of scale. Indeed, chaining a forward and backward FFTW transforms of length Nz
results in a scaling by Nz
.
Fianlly, some FFTW transforms, such as c2r, are allowed to overwrite their input arrays, except if the flag FFTW_PRESERVE_INPUT is added:
FFTW_PRESERVE_INPUT specifies that an out-of-place transform must not change its input array. This is ordinarily the default, except for c2r and hc2r (i.e. complex-to-real) transforms for which FFTW_DESTROY_INPUT is the default. In the latter cases, passing FFTW_PRESERVE_INPUT will attempt to use algorithms that do not destroy the input, at the expense of worse performance; for multi-dimensional c2r transforms, however, no input-preserving algorithms are implemented and the planner will return NULL if one is requested.